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Abstract 


Current research in specifications is beginning to emphasize the practical use of formal 
specifications in program design. This thesis presents a specification approach, a 
specification language that supports that approach, and some ways to evaluate specifications 
written in that language. 


The two-tiered approach separates the specification of underlying abstractions from the 
specification of state transformations. In this approach, state transformations and target 
programming language dependencies are isolated into an interface language component. All 
interface specifications are built upon shared language specifications that describe the 
underlying abstractions. This thesis presents an interface specification language for the CLU 
programming language and presumes the use of the Larch shared language. 


This thesis also suggests a number of kinds of analyses that one might want to perform 
on two-tiered specifications. These are related to the consistency, completeness, and 
strength of specifications, and are all presented in terms of the theories associated with 
specifications. . 
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1. Introduction | 


The goal of this thesis is to help people write formal specifications of pieces of large 
software. To achieve this goal, we propose a two-tiered approach for formally specifying the 
behavior of sequential programs, we describe a language that supports this approach, and we 


suggest ways to evaluate specifications written in this language. 


Aspecification describes a program's behavior; it is independent of the program itself. It 
is formal if it is written in a language with explicitly and precisely defined syntax and 
semantics. Two virtues of formal specifications are their precision and amenability to 


machine-manipulation. 


Current research in specifications is beginning to emphasize the practical use of formal . 


specifications in the programming process. People have already benefited from using 
informal specifications in most phases of this process. Writing informal specifications is 
widely accepted as a useful way of organizing ideas, documentating design decisions, and 
informally arguing the correctness of Eeery Software design methods that include some 
form of informal specification have been in use in industry for some time [Caine75, 


Jackson75, Katzan76, Yourdon78}. 


Thus far, formal specifications have played a less influential role in the programming 
process than informal specifications. People have used them with limited success in program 
verification, and have just begun using them in program design. We believe that formal 
specifications can and should play a more important role in the ‘programming process than 


they do now. 


Using formal specifications early in the programming process, i.e., the design phase, 
should reduce the time, effort, and resources spent in the overall process, especially in the 
costly testing, debugging, and maintenance phases. It is often the act of specifying and not 


the final product that is most useful in the design phase. Uncovering bugs early can save the 
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cost of uncovering them later in the testing and debugging phases. Also, as with informal 
specifications, a formal specification serves as a valuable piece of documentation--a means of 
communicating between a client and a specifier, between a specifier and programmers, and 


among programmers. 


There are many problems with trying to use formal specifications during program 
design. Ironically, one is that the need to be precise intimidates many programmers. The 
problem of programmers learning how to read and write formal specifications can be 
gradually overcome. Every programmer has already learned to deal with at least one formal 
language--a programming language. We need to make formal specifications more accessible 
to programmers by supplying an easy-to-learn and easy-to-use specification language, and by 


suggesting guidelines for reading and writing specifications. 


Another problem is that much of the past research in formal specifications focused on 
theory and not practice, so that specifications of small examples pervade literature, e.g., the 
ubiquitous stack. The result of this theoretical focus is a collection of small and 
self-contained specifications of the behavior of well-understood data structures or of small 
and simple programs. Small examples are not convincing and the lack of larger ones 
reinforces people's reluctance to accept the use of formal specifications. We need to 


demonstrate the use of formal specifications on larger examples. 


The problem of size has been addressed in programming. In the same way a large 
program is constructed from program modules, the specification of a large program should be 
constructed from specifications of the program modules. This technique introduces the two 
subproblems of how to specify the pieces and how to combine them; this thesis focuses on 


the former. 


Finally, another problem is that in the development of a specification the specifier is 
usually not provided with any feedback as to whether the specification is in some sense 
"correct." We need to identify and check for properties of the specification that relate to its 
utility. Ideally, we would check individual components of the specification for local properties, 
like — sufficient-completeness [Guttag75], | expressive-richness [Kapur80b], and 
implementation-bias [Jones80], and the entire specification for global properties, like 
modularity [Parnas72b] and coupling [Myers75]. Since we expect specifications to grow 


incrementally, feedback needs to be provided on incomplete specifications. 


We organize the rest of this chapter as follows. Section 1.1 contains a statement of the 
problem and the essence of our solution. The next two sections describe in some detail, but 
not formally, the key aspects of the specification approach, and the key features of a 
particular specification language. We define the language precisely in later chapters. 
Section 1.4 contains a discussion on related work. Section 1.5 presents the approach we take 
for providing a formal basis for defining the specification language. It also contains a guide to 


the rest of this thesis. 
1.1 The Problem 


The main problem specifiers face is that formal specifications are hard to write. The 
effort invoived in writing them has thus far been disproportionate to the benefit gained from 
having written them. We propose one step towards a solution to this problem by providing the 


specifier with: 


1. Aspecification approach, 
2. A specification language, and 


3. Ways to evaluate specifications. 
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The most significant contribution of this thesis is the specification approach, the 
two-tiered approach. \t motivates the design of the specification language whose precise 
definition constitutes the bulk of this thesis. In this chapter, we discuss the approach and give 


an overview of the language; in Chapter 5, we address the evaluation of specifications. 


We keep in mind the following two goals. First, we want to make specifications easier for 
programmers to understand. This goal greatly affected our language design. Second, we 
want to make it easier to reason about specifications with sufficient machine support. 
Machine support, such as that provided by a theorem-prover, allows us to infer properties 
about not only the specification, but also what it specifies. This goal greatly affected our 


approach to our formalization. 
1.2 The Two-Tiered Approach 


Sections 1.2.1 and 1.2.2 describe, in general terms, the two-tiered approach and 
two-tiered specifications, respectively; Section 1.2.3 outlines how a specifier would follow our 


approach to write a specification. 
1.2.1 The Approach 


The two-tiered approach to specifying programs separates the specification of 
underlying abstractions from the specification of state transformations. We use a shared 
specification language to describe underlying abstractions, and an interface specification 
language to describe state transformations. The specification of a program module is written 
- in an interface language and consists of two parts: a shared language component (bottom 
tier) and an interface language component (top tier). These two components correspond to 


the two tiers in our approach. 


Ge 


The interface specification language is programming language dependent, while the 
shared language is programming language independent. This allows us to keep separate the 
description of programming language independent issues from the description of 
programming language dependent ones, eg., side effects, error handling, and resource 
allocation. For example, if we were to implement arithmetic, we would describe ideal 


arithmetic in the shared language, and we would describe boundary conditions constrained 


by word and memory size in an interface language. 


Since the invention and description of key abstractions is done in the shared language, 
we expect most of the effort involved in writing a specification to be invested in the shared 
language component. The interface language component should deal only with state 
transformations and programming language dependent issues. One reason for separating 
the two language components is that we expect many shared language components to be 
reuseable by different interface language components. Some of them will be developed for 


particular applications; a few central ones will be useful in many applications. 


We use the term "interface" because an interface specification describes all the 
information about the behavior of the program module. Any user of a program module need 
only look at its interface specification to understand the module’s behavior. We use the term 
"shared" because in the design of a family of interface languages, each interface language Is 
derivable from a subset of a target programming language, and a common subset, which is 


the shared language. 
1.2.2 Two-Tiered Specifications 


In this thesis we focus on the description of an interface language for the programming 
language CLU [Liskov77, Liskov81]. In this section, however, we discuss, in general terms, 


syntactic and semantic properties of interface and shared language components. 
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An interface language component has three parts: a header, a body, and a /ink to the 
shared language component of the specification. The syntax of the header is based on the 
syntax of the programming language. For example, the types of the input and output 
arguments to a procedure are listed in the header information of a procedure specification as 
they would be in an implementation. The body contains first-order assertions written in a 
language based on its shared language component, plus special assertions, which are 
introduced to handle issues dependent on the semantics of the programming language. The 
meaning of the assertions is based on first-order predicate logic with equality, where equality 
is defined by its shared language component. The link identifies the shared language 


component to be used. 


The crucial syntactic information provided by a shared language component to an 
interface language component is a set of sort identifiers, and a set of function identifiers and 
function signatures. The function identifiers are composed to build terms, which are used to 
write the assertions appearing in the body of an interface language component. The sort 
identifiers and function signatures are used to sort-check terms much in the same way as type | 
identifiers are used to type-check programs. The crucial semantic information provided by a 
shared language component to an interface language component is a theory of equality for 


terms. 


By explicitly including a shared language component in an interface specification, we 
gain the advantage that every symbo! in an assertion is precisely defined within a 
specification. In some other specification methods [Hoare72, Parnas72a], there is a reliance 
on an interpretation for symbols in an assertion, where the interpretation comes from outside 
the specification. For example, the meanings of symbols like € and C might come from 
textbooks on set theory. In contrast, some other methods [Robinson77, Jones81] provide an 
assertion language defined within the specification, but restrict the symbols to come from a 


fixed set of primitives. We gain the advantage that the user is able to provide just the symbols 
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necessary to write the assertions in the body of a specification. 


1.2.3 Following the Approach 


When a designer begins to write specifications early in the programming process, the 
act of specifying intertwines with the act of designing. One helps the other. We sketch below 


atypical top-down design strategy that could be used in following the two-tiered approach. 


1. Develop an approximate intuition of the problem to be solved. 
This requires close, often verbal, interaction with the client who is 
posing the problem. 


2. Decide on the major abstractions. 


1. Top tier: Write the header information of 
the interface language components. 


2. Bottom tier: Write the syntactic 
information of the shared language 
components of the specification, ie., the 
sort identifiers, and function identifiers and 
signatures. 


3. Fill in the blanks. 


1. Top tier: Fill in the information in the 
bodies of the interface language 
components of the specification, ¢.g., write 
the assertions in the body of a procedure 
specification. | Simultaneously generate 
additional function and sort identifiers 
needed from the shared language 
components. 


2. Link between top and bottom tiers: 
Define the explicit link to the shared 
language components of the specification. 


3. Bottom tier: Fill in’ the semantic 
information in the bodies of the shared 
languages components of the specification, 
i.e., the theory of equality for terms. 


-14- 


4. Check one’s understanding of the problem and its formalization; 
repeat previous steps until convergence is achieved. 


There are two points worth observing in regard to following this approach, especially for 
large pieces of software. First, as with any overall design method, many iterations over these 
steps may be necessary. Writing a specification sharpens a specifier’s intuition of the 
problem. Hidden design decisions surface. Addressing postponed decisions often requires 
_ Modifications of decisions made earlier. Second, the specifier should be willing to discard 
large chunks of a specification in the process of refining the abstractions. This is especially. 
true after the first iteration. Often after a large investment in time and effort, the specifier (or 
designer or programmer) is reluctant to start anew or to try an alternate strategy. With 
sufficient machine support the specifier should be able to save time and effort often spent in 


managing and maintaining the consistency of a large specification. 


During the process of writing a specification, the specifier should also evaluate it for 
certain properties, e.g., consistency and completeness. Checking for these properties as a 
specification develops can increase one’s confidence that a specification is in some sense 
"good." We discuss the evaluation of specifications in Chapter 5. Finally, as with any design, 
the specifier should evaluate the overall structure of the specification, e.g., analyze the 
interconnectivity among its components. We do not address this kind of specification 


evaluation in this thesis. 
1.3 A Glimpse at a Particular Two-Tiered Specification Language 


In this section we provide an overview of the two-tiered specification language we define 
more precisely in the rest of this thesis. By considering a specific programming language and 
a specific shared language we gain the advantage of being concrete in defining our interface 


language. 


The interface language we describe is for the programming language CLU. Section 
1.3.1 gives a preview of the CLU interface language with those concepts from CLU required to 


understand the interface language presented as needed. 


The shared language we choose is the Larch Shared Specification Language 
[Guttag83a], henceforth referred to as "Larch." Enough similarity between Larch and other 
axiomatic specification languages (see Section 1.4.4 on related work) exists so that a different 
specification language could be used as the shared language. Section 1.3.2 gives an informal 
overview of Larch. We describe only the minimal subset of constructs in Larch needed to 
understand the examples presented in this thesis. Details on Larch can be found in 


[Guttag83b]. 
1.3.1 A Preview of the CLU Interface Language 


CLU has the primitive notions of object and state. An object is an entity that can be 
’ manipulated by a program. Two important properties of an object-are its type, which never 
changes, and its value, which may change. A state consists of a set of objects, a mapping 
from program variables (object identifiers) to objects, and a mapping from objects to values. 
Two important observable state changes are when a new object is created and when the 
value of an existing object changes. An object whose value can change is said to be mutable. 


A type is mutable if objects of that type are mutable. 


It is important not to confuse an object and its type, which are CLU concepts, with a term 
and its sort, which are shared language concepts. The connection between the CLU ad the 
shared language concepts is that (typed) objects have values that are denotable by (sorted) 
terms. Through the interface specifications of procedures and clusters, we establish a link 
between the values that objects can have and the terms defined by shared language 


components. We establish this link explicitly in the text of the interface specifications. 
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A CLU program consists of a set of modules, each of which is either a procedure or 
cluster. A rocsdire performs an action on a set of objects, and terminates returning a set of 
objects. Communication between a procedure and its invoker generally occurs through these 
objects. A cluster names a type and defines a set of procedures that create and manipulate 
objects of that type. . Users of this type are constrained to treat objects of the type abstractly. 
That is, objects can be manipulated only via the procedures defined by the cluster so, in 


particular, information about how objects are represented in storage may not be used. 


A procedure specification consists of a header, a link to its shared language component, 
and a body. Header information includes the types of the input and output arguments to the 
procedure and a list of possible termination conditions. The link is the name of a shared 

_language component. Since the unit of encapsulation in Larch is called a trait, we call the link 
in an interface specification the used trait. The body of the specification contains two 
assertions that correspond to a pre-condition on the state when the procedure is invoked and 
a post-condition on the state when the procedure terminates. Terms in these assertions are 
constructed from function identifiers provided by the used trait. The pre- and post-conditions 


may also contain other special assertions particular to CLU’s semantics. 


Figure 1 gives an example of a procedure specification. The identifiers, s and i, that 
appear in the header denote objects of type set and int, respectively. The name of the shared 
language component is SetOfint, which is choose’s used trait. The pre-condition is satisfied if 


the initial value of the input argument is not empty. The post-condition contains an assertion 


choose = proc (s: set) returns (i: int) 
uses SetOfint 
pre ~isEmpty(st) 
post has(st,i+) Ast = remove(st,i+}) A mutates s 
end 


Figure 1. Choose Procedure Specification 
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about the initial and final values of the set object and the final value. of the int object. An 
object identifier that is followed by an up arrow (t) denotes the value of that object in the state 
upon procedure invocation, i.e., the initial state; one followed by a down arrow (+) denotes the 
value in the state upon procedure termination, i.e., the final state. The function identifiers, 
isEmpty, has, remove, and A, and the meaning of the equality symbol, =, all come from 
SetOfint. The last conjunct in the post-condition, mutates s, is an example of a special 
assertion; it states that the choose procedure may mutate no object other than that denoted 


by s. 


A cluster specification consists of a header, a link to the shared language component, 
and a body. The header is a list of procedure identifiers. The body of the specification 
consists of a set of procedure specifications. The link from the interface component to the 
shared component is given by a used trait and a provides clause. The used trait supplies all 
function identifiers that appear in the assertions of the procedure specifications of the cluster 
specification. The provides clause gives a mapping from a type identifier to a sort identifier. 
This mapping determines the values over which objects of the type defined by the cluster can 
range. All objects of the type are restricted to values denotable by terms of that sort. The sort 
identifier must appear in the used trait. The provides clause also indicates whether the type 


is mutable or not. 


Figure 2 gives a skeleton of a cluster specification that defines the type, set. The used 
trait is SetOfint. The provides clause gives a mapping from the type identifier, set, to the sort 
identifier, S/, which comes from SetOfint. The keyword mutable indicates that objects of the 
set type are mutable. Specifications for create, insert, remove, and member are of the form 


described for procedure specifications. 


set = cluster is create, insert, remove, member 
uses SetOfint -* 
provides mutable set from SI 
create = proc () returns (s: set) 
end 
insert = proc (s: set, i: int) 
end 
remove = proc (s: set, i: int) 
end 
member = proc (s: set, i: int) returns (b: bool) 
end 


end 


Figure 2. Set Cluster Specification 


1.3.2 An Overview of Larch 


The unit of encapsulation in Larch is called a trait. The identifier appearing before the 
keyword trait is the name of the trait and is distinct from the sort and function identifiers 
appearing in the trait. We will refer to Figures 3 and 4 to help illustrate the meanings of 


constructs appearing in traits. We repeat these figures in Appendix ! for future reference. 


Equivalence: trait 
introduces 
eq: E, E > Bool 
constrains [eq] so that for alll [x, y, z: E] 
eq(x,x) = true 
eq(x,y) = eq(y,x) 
((eq(x,y) A eq(y,z)) = eq(x,z)) = true 


Figure 3. Equivalence Trait 
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SetOfE: trait — 

includes Integer, Equivalence 

introduces 
empty: > C 
add:C,E—~C 
remove: C,E > C 
has: C, E — Bool 
isEmpty: C — Bool 
card: C — Int 

closes C over [empty, add] . 

constrains [C] so that for all [s: C, e, e1: E] 
remove(empty, e) = empty 
remove(add(s,e), 61) = if eq(e,e1) then remove(s,e1) else add(remove(s,e1),e) 
has(empty, e) = false 
has(add(s,e), e1) = if eq(e,e1) then true else has(s,e1) 
isEmpty(empty) = true 
isEmpty(add(s,e)) = false 
card(empty) = 0 
card(add(s,e)} = if has(s,e) then card(s) else 1 + card(s) 


SetOfint: trait - 
includes SetOfE with [SI for C, Int for E] 


Figure 4. SetOfE and SetOfint Traits — 


A trait contains a set of function declarations, which follows the keyword int foauces, 
and a set of axioms, which follows a constrains clause. A function is declared by giving its 
name (an identifier) along with its signature, i.e., a domain and range. A domain is a list of 
sort identifiers, and a range is a single sort identifier. In the Equivalence trait (Figure 3), the 
eq function has two arguments of sort E, and returns a result of sort Bool. All traits may use 
boolean connectives, e.g., A and = in Equivalence, with their usual first-order propositional 
— logic meanings. Functions can be declared to be mixfix or prefix.’ For example, if .eq is to be 


used as an infix function, we would write " #.eq #4: E, E — Bool" in its declaration. 


There are two kinds of axioms that can appear after a constrains clause. One kind of 
axiom is an equation relating two terms. The " = " symbol denotes an equivalence relation on 
terms. The second kind of axiom, not seen in either Figure 3 or Figure 4, is of the form "+r 


exempt" where + is aterm. This indicates that the lack of an equation is not an oversight and 


is an aid to "Completeness" checking. An example of an axiom of this form is "pop(null) 


exempt," which might appear in a trait that defines a theory of stacks. 


A function identifier is constrained if it appears in the bracketed list following the 
keyword constrains. If a sort identifier appears in the bracketed list (e.g., in the SetOfE trait 
of Figure 4), each function identifier whose signature contains that sort identifier is 
constrained. A constrains clause indicates the function identifiers that are intended to be 


- constrained in the equations. 


A trait denotes a theory, i.e., a set of formulae closed under a set of inference rules. 
Each equation appearing in a trait is a formula in the trait's theory. An axiom of the form "r 
exempt" adds nothing to a trait’s theory. We can enrich the theory denoted by a set of 
equations by adding closes clauses (explained below). Together the introduces, 
constrains, and closes clauses, the "inequation" ~(true = false), and propositional and 


quantified tautologies define a first-order theory of a trait. 


A closes clause adds an inductive rule of inference to a trait. Closing a sort, S, over a 
set of function identifiers, F, asserts that there is a representative member, 1, of each 
equivalence class of terms of sort S, where each function identifier with range sort S 
appearing in 7 is in F. The inductive rule of inference is used to. add formulae to a trait’s 
theory that cannot be shown using purely equational logic. For example, the closes clause in 
the SetOfE trait asserts that each term of sort C is equal to a term, 7, where each function 
identifier with range sort C appearing in 7 is either empty or add. The associated inductive 


rule of inference can be used to derive theorems like Ws:C card(s) > 0. 


Larch also provides ways of putting traits together, one of which is an includes clause. 
A trait that includes another trait is textually expanded to contain all function declarations, 
constrains clauses, closes clauses, and axioms of the included trait. The meaning of the 


including trait is the meaning of the textually expanded trait. In SetOfE, the signature of eq, 
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which is used in the axioms of SetOf£, comes from that given in the included Equivalence 


trait. 


Finally, function and sort identifiers that appear in an included trait can be renamed. An 
explicit renaming is given in brackets following the keyword, with. In the SetOfint trait the 
sort identifiers C and E of SetOfE are respectively renamed to be S/ and Int. Renaming is used 


both to collide identifiers intentionally and to prevent identifiers from colliding. 
1.4 Related Work . F 


Work related to this thesis falls into two broad categories: specification languages and 
uses of formal specifications. Various specification languages have developed in parallei with 
different roles of formal specifications in the programming process and with the evolution of 
higher-level languages. We now discuss each of the following topics as they relate to this 
thesis: using specifications in program verification, using specifications elsewhere in program 


development, specifying abstract data types, and specification languages. 
1.4.1 Program Verification 


Origins of the use of formal specifications can be traced to early work done on proofs of 
program correctness [Floyd67, Hoare69], and later work done on machine-aided program 
verification (e.g., see [King69, Deutsch73, Boyer75, Good75, vonHenke75, London75, 
Suzuki75]). Most of the work is based on Floyd's inductive assertions technique [Floyd67] 
and on Hoare’s axiomatic approach to specifying the meaning of programs [Hoare69] (for an 
excellent review of subsequent developments based on Hoare’s approach, see [Apte1]). 
Early proofs were of programs written in simple programming languages (e.g., while 
programs) or manageable subsets of higher-level languages like Pascal. Most of the work 
does not focus on the approach for the construction of specifications nor on the specification 


language itself; in contrast, our work focuses on both. 
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In the mid 1970's, the focus of program verification turned to problems of specifying 
programs using data structures like pointers, arrays, and records [Suzuki76, Luckham76, 
Wegbreit76, Reynolds77], and using shared data [Burstall72, Oppen75, Yonezawa/77, 


Schaffert81]. Of these, Schaffert’s work is most closely related to ours. 


Schaffert studies the problem of specifying and verifying programs that use abstract 
data types and shared data with an emphasis on verification. Although his specification 
language is not particular to CLU, its design is motivated by CLU semantics. One difference 
between his specification language and ours is that he combines the specification of 
properties of objects of an abstract data type with the specification of properties of their 
values into one specification rather than separating them into two parts as in our two-tiered 

approach. Another difference is that his assertions are not restricted to first-order logic so 


mechanization of his proofs would be more difficult than of ours. 
1.4.2 Program Development 


Philosophical discussions on the practical use of formal specifications can be found in 
[Parnas77] and, more recently, in [Guttag82]. Guttag and Horning advocate the use of formal 
specifications in the design phase of program development in [Guttag80b], where they hint at 
the two-leveled approach to specifying programs. They specify routines using 
weakest-preconditions [Dijkstra76], but the main example of their paper contains no 
specifications of routines. More importantly, they do not make explicit, as we do, 
programming language dependencies in their routine specifications nor do they make explicit 
a connection between routine specifications and their sigabrais. specification components. 
Jones also advocates the use of formal specifications for program development; his formal 
method stems from the Vienna Definition Method (VDM) (see [Bjorner78] for extensive 


coverage and related references on VDM). 
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The use of Spacincations to enforce "modular" programming gave rise to the distinction 
between a "specification part" and "implementation part" in the encapsulation units of 
programming languages such as Mesa modules [Mitchell78] and Ada packages [Ada79]. 
Each encapsulation unit has a specification part that defines how implementation parts of 
other encapsulation units can use it. Specification parts contain syntactic information that 
the compiler can use, such as the types of input and output arguments, and possible 
termination conditions of a procedure, but no formal semantic information about the 
| encapsulation unit, such as the input-output behavior of a procedure. The design of the CLU 
library includes this kind of specification information as well. Specifications in CLU, however, 
are not part of the syntax of the language. Specifications written in our interface language are 
like "specification parts" except that we provide not only syntactic, but also semantic, 


information about program modules. 
1.4.3 Abstract Data Types 


Formal specifications have been used extensively to describe abstract data types, 
leading to two different approaches, sometimes referred to as “operational” and 
"definitional." A survey of these approaches can be found in [Liskov79]. In the operational 
approach, one gives a method of constructing the abstract data type. Examples of the 
operational approach include Parnas's work on state-machines [Parnas72a], Robinson and | 
Roubine’s extensions to them with V-, O-, and OV-functions [Robinson77], Berzins’s abstract 


models [Berzins79], and Jones’s model-oriented specifications [Jones80]. 


In the definitional approach, one gives a list of properties of the abstract data type, not a 
method of constructing the type. The definitional approach can be broken into two 
categories, sometimes referred to as "axiomatic" and "algebraic." The axiomatic approach 
stems from Hoare’s work on proofs of correctness of implementations of data types 
[Hoare72], where predicate logic pre- and post-conditions are used for the specification of 


each operation of the type. Other work using the axiomatic approach is in [Standish73] and 
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[Nakajima80]. In the algebraic approach data types are datinad to be hetereogeneous 
algebras [Birkhoff70]. This approach uses axioms to specify properties of abstract data types, 
but the axioms are restricted to equations. Much work has been done on the algebraic 
specification of abstract data types [Goguen75, Guttag75, Zilles75, Burstall77, Ehrich78, 
Wand79, Kamin83] including the handling of error values [Goguen77, Goguen78, Kapur80al, 


nondeterminism [Kapur80a], and parameterization [Thatcher78, Goguen81, Ehrig80]. 


Our work is related to both the axiomatic and algebraic approaches. At the interface 
language level, a cluster specification that defines a data type is written in an axiomatic style 
since pre- and post-conditions are associated with each of the procedure specifications. At 
the shared language level, a trait specification is written in an algebraic style where axioms 


appearing in a trait are restricted to be primarily equational. 


One significant difference between the axiomatic part of our approach and other 
axiomatic approaches is that we define the truth of an assertion with respect to two states. 
Since a program is normally viewed as an input-output relation, a post-condition often needs 
to refer to both the initial and final values of objects. Usual Hoare logic, in which each 
predicate if a triple is interpreted with respect to a single state [Hoare69], uses a standard 
trick of introducing free variables in pre-conditions to "save" the initial values. Jones avoids 
this by defining pre-conditions on one state and post-conditions on two [Jones80]. We also 
avoid this by interpreting all assertions, found in both pre- and post-conditions with respect to 


two states. 
1.4.4 Specification Languages 


Much of the work on specification languages has evolved from work done on the 
specification of abstract data types. The more widely-known specification languages that 
have resulted from this research are CLEAR [Burstall77, Burstall81], lota [Nakajima80], Z 


[Abrial80], SPECIAL [Robinson77], and VDM’s Meta-|V [Bjorner78]. CLEAR, lota, and Z stem 


from the definitional approach of describing abstract data types. SPECIAL and Meta-IV stem 


from the operational approach, so we discuss them separate from the other three. 


CLEAR, lota, and Z distinguish between a "syntactic part" and a "semantic part" where 
the syntactic part defines the signatures of functions. The semantic part of a CLEAR 
specification is a set of equations with universally quantified variables, and a possible 
induction rule. Models of a theory in CLEAR are based on initial algebras. The semantic part 
of an lota specification is a set of axioms written in first-order predicate logic, and a possible 
induction rule. A model for an lota specification is also an algebra, but since lota does not 
restrict axioms to be equations, the existence of an initial algebra is not guaranteed. The 
semantic part of a Z specification is a set of predicates on sets, relations, and functions. A 

model for a Z specification is a set that satisfies those predicates together with an 


interpretation of the relation and function symbols. 


One important difference between these three specification languages and ours is that 
specifications written in CLEAR, lota, and Z have no simple way of specifying side effects and 
error handling of procedures that implement the specified functions. As stated in Section 
1.2.1 we use the interface language component of a two-tiered specification to deal with 
issues like side effects and errors. As an intended consequence of our separation of 
concerns, CLEAR, lota, and Z can be substituted for Larch as a shared language although 
_ doing so would correspondingly change the underlying models of interface specifications. 
Each, however, provides the required syntactic and semantic properties of the shared 


language that we discussed in Section 1.2.2. 


SPECIAL’s viewpoint is similar to our two-tiered viewpoint; it separates the "assertion" 
part, analogous to our shared language component, from the "specification" part, analogous 
to our interface language component. A major difference between SPECIAL and our work is 
that in SPECIAL, types used in the specification part are defined in the assertion part. A type 


is restricted to be either a primitive type, a subtype, or a structured type, each of which comes 
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with a set of sieGelinad functions. Hence, since the assertion language is so restricted, most 
of the work of writing a specification is done in the specification part, where their O-, V-, and 
OV-function definitions correspond to our procedure specifications. We take the opposite 
viewpoint and expect most of the work of writing a specification to be done in the "assertion" 


part (shared language component). 


The most significant difference between Meta-IV, which is the language of the Vienna 
Definition Method, and our language is that we do not use an éperational approach to writing 
specifications.. In Meta-IV, a model of an abstract data type is given in terms of previously 
defined types. Constraints on the properties of such a model are given in terms of 
"meta-programs," which include the use of declarations, assignment statements, and 


conditionals. 
1.5 What is in this Thesis 


We reemphasize that the most important contribution of this thesis is the two-tiered | 
approach and the particular separation made between the two components of a specification. 
This thesis lays out a basis for this approach by formally defining a two-tiered specification 
language (Chapters 2, 3, and 4), and describes ways to evaluate two-tiered specifications 
(Chapter 5). In Section 1.5.1 we discuss our approach to defining the language formally, and 


in Section 1.5.2 we give a guide to the rest of this thesis. 
1.5.1 Approach to the Formalization 


This thesis deals with specifications, i.e., strings of symbols. A string of symbols may be 
viewed in two ways: as a sentence of a language, or as the meaning of that sentence. 
Logicians sometimes call the first point of view "syntactic" and the second point of view 
“semantic.” From the syntactic viewpoint, a precise description of sentences is given by 
defining a formal system: a set of symbols, a set of well-formed formulae, d set of axioms, and 


a set of rules of inference. A theory associated with a formal system is the set of well-formed 


formulae derivable from the axioms and rules. From the semantic viewpoint, a precise 
description of sentences is given by defining a mode! for the language. A model consists of a 
universe of mathematical entities such as sets and functions, and a mapping (sometimes 
called an interpretation) from sentences in the language to the mathematical entities. These 


mathematica! entities are called meanings of the sentences. 


The syntactic and semantic views are related. A sentence, o, in a language, L, is valid if 
it is true in every model for L. We write "M FE o" to denote that the sentence a is true in the 
model M (or equivalently, "o holds in M," "M satisfies o," and "M is a model of a"). Mis a 
model for a set of sentences, S, if it is a model for each o€X. Since a theory is a set of 


sentences in a language, it also makes sense to talk about a model of a theory. 


In this thesis, we concentrate on describing specifications and implementations from a 
syntactic viewpoint because we can treat them as concrete objects, i.e., text written down on 
a piece of paper, as opposed to abstract mathematical entities. Furthermore, we define a 
Satisfies relation between an implementation and a specification in terms of their theories. 
Chapter 3 contains the definitions of satisfies and the formal systems associated with 


specifications and implementations. 


It is important to establish the soundness of these formal systems. Informally, a formal 
system, F, is sound if no invalid formula is deducible from the axioms and rules of inference of 
F. That is, any theorem in the theory, T, specified by F is valid in all models of T. Formally, F is 
sound if all the axioms of the formal system are valid and the rules of inference are sound. A 


rule is sound if the validity of each of its hypotheses implies the validity of the conclusion. 


Therefore, to show the soundness of the formal systems we will define, it is necessary to 
define (1) the classes of models of the theories of the formal systems and (2) the validity 
relation (F) between models and theories. Chapter 2 contains the definitions of these 


classes of models, which are the same for specifications as for implementations, and the 


definition of the validity relation for specifications. Although we lay out the foundations to be 
able to prove the soundness of the formal systems we describe, it is outside the scope of this 


thesis to present the proof. 


We choose to present the semantic viewpoint first (Chapter 2). and the syntactic one 
later (Chapter 3) because we believe that it is easier to understand the meanings of 
specifications and implementations in terms of familiar mathematical entities such as sets, 
functions, and relations, rather than in terms of strings of symbols and rules that manipulate 
them. We hope that it is easier for the reader to compare whether his intuition matches ours, 
i.e., whether the models we define reflect the same intuitive concepts he has about the 


meaning of a program and its behavior. 
1.5.2 A Guide to the Rest of the Thesis 


In Chapters 2 and 4, we view specifications semantically. We give meanings to 
specifications in terms of mathematical entities that include, among other things, algebras. 
and relations. In Chapter 2, we define a kernel interface language, and in Chapter 4, we 
define extensions to the kernel. The kernel language is defined to serve as a basis for other 
interface languages and also to reduce the number of linguistic constructs to consider when 
viewing specifications syntactically. The extensions in Chapter 4 are syntactic amenities to 


the kernel and additional constructs to handle particular features in CLU, e.g., iterators. 


In Chapters 3 and 5, we view specifications syntactically. The formal systems associated 
with specifications are defined by using the axiomatic semantics of CLU, which associates 
proof rules with individual CLU statements and expressions, and the semantics of Larch. In 
Chapter 3, we define the theory denoted by a specification written in the kernel interface 
language. In Chapter 5, we describe evaluation properties of specifications in terms of these 


theories. 
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Chapters 2 and 3 can be read together for a formal description, in terms af both models 
and theories, of the kernel interface language. Chapters 2 and 4 can be read together fora 
description of the entire interface language for CLU. Chapters 3 and 5 can be read together 


for an idea of the benefits gained from treating the meanings of specifications as pure text. 


Finally, in Chapter 6 we summarize our conclusions and main contributions of this 


research, and discuss directions for future work. 
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2. Kernel Interface Language 


This chapter defines a kernel language that can be used to write specifications of CLU 
programs consisting of procedures and clusters. A procedure specification specifies the set 
of procedures that implement it; a cluster specification specifies the set of clusters that 


implement it. 
We would like the kernel language to have the following properties: 


1. Rich enough to allow us to specify any operation or type one 
might want to implement in CLU. 


2. A small number of constructs. In Chapter 4, in order to make 
reading and writing specifications easier, we introduce some 
syntactic sugar and add other constructs to the kernel. The 
additions will be defined by translating them into constructs of the 
kernel language. 


3. A syntax that maps easily into the well-formed formulae of the 
theory that a specification denotes. This is to simplify the formal 
definitions presented in Chapters 3 and 5. 


A goal for the entire interface language, not just the kernel, is that it be adaptable to 
programming languages other than CLU. The particular concrete syntax presented, not 
surprisingly, borrows heavily from CLU, but the abstract syntax of the interface language can 


serve as a basis for an interface language for other programming languages. 


Section 2.1 presents the classes of models for theories associated with specifications 
and implementations. Section 2.2 presents the (kernel) interface language. The two main 
objectives of Section 2.2 are (1) to define the validity relation (F=) between a model and a 
specification, and (2) to present the precise syntax and (model-oriented) semantics of 
procedure and cluster specifications. The presentation is bottom-up. Assertions constitute 
the body of a procedure specification, and procedure specifications constitute the body of a 


cluster specification. Hence, we start by defining an assertion language based on Larch, then 
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procedure specifications, then special assertions that are additions to the assertion language 
particular to CLU, and finally, cluster specifications. We warn the reader that we sometimes 
digress from our two main objectives of Section 2.2 in order to present some necessary detail 


for the sake of precision. 
2.1 Classes of Models 


A theory defines a class of models. In this section, we are interested in describing the 
classes of models for the theories of specifications and implementations. To do so we use the 
basic mathematical entities of values, functions, and relations to define the notions of objects, 


states, operations, and abstract data types. 


Let us first motivate the kinds of models we will introduce to model the computation of a 
CLU program. The execution of a program begins with the invocation of some operation in 
some initial state. The execution of the operation and of subsequent operations invoked in a 
computation can change the state. We thus need to characterize carefully what information is 
in a state and what possible changes to a state may arise because of the execution of an 
operation. An operation can change a state by creating new objects and changing the values 
of existing ones. Each CLU object can be accessed only through certain operations, 


depending on the abstract data type it belongs to. 


We present our classes of models in a bottom-up fashion: we start off by describing 
values, then objects, states, operations, abstract data types, and finally, computations. In 
Section 2.1.1, we define when an a/gebra is a model of a trait theory. In Sections 2.1.2 and 
2.1.3, we discuss the domains of objects and states, which underlie the models of procedures 
and clusters. In Sections 2.1.4 and 2.1.5, we define the classes of models for procedures and 
clusters, respectively. We call these models operations and abstract data types. The classes 
of models for specifications are the same as for their implementations. The chart in Figure 5 


summarizes the syntactic and semantic domains we will be dealing with. Finally, in Section 


e 


2.1.6 we define our model of computation. 
Syntactic Conventions 


For an n-tuple, x = <vj, ..., V,>, we write x.v; for the ith component of x. For a function of 


one argument, f, we write dom/(f) for the domain of f and ran(f) for its range. 
2.1.1 Traits and Algebras 


A trait defines a set of equations, propositional formulae, and first-order quantified 
formulae that makes up the trait’s first-order theory with equality. The class of models of the 
theory of a trait is a set of many-sorted algebras. We use the usual definition of satisfaction 
between an algebra and a first-order theory that has equality [Birkhoff70, Enderton72]. We 


define an algebra to be a model of a trait Tr if it satisfies the theory of Tr. 


A many-sorted algebra is a pair consisting of a set of values, Va/, partitioned according 
to their sorts, and a set of total functions, Fun, over these values. We use the set of terms, 
Term, to denote values in Vai. Terms are of the form "x" where x is in the set of (sorted) 
variable identifiers, Varid, or of the form "f(t1, ..., tn)" where f denotes a function in Fun, and 


ti, ..., tn are terms. Let Sortid be an infinite set of sort identifiers (not associated with any 


Syntax (text) Semantics(models) 


Specifications 


Trait Algebra = <values, functions> 

Procedure specification Operation = <relation, algebra> 

Cluster specification Abstract Data Type = <objects, operations> 
Implementations 

Procedure Operation 

Cluster Abstract Data Type 


Figure 5. Syntax and Semantics 


particular algebra). Henceforth, when we say "algebra," we mean a many-sorted algebra. 
2.1.2 Objects 


Let Obj be an infinite set of objects partitioned into subsets according to their types. 
Each object has exactly one type, which cannot be changed. We call Obj the universe; it is 
the set of all potentially existing objects. A state (defined below) defines a value for each 
object. When an object's value changes, we say the object is "mutated." Let Typeld be an 
infinite set of type identifiers (not associated with any particular universe), and let TtoS be a 
many-to-one function that maps type identifiers to sort identifiers. For an object, x, of type T, 


the sort of the value of x is TtoS(T). 


In CLU, an object, A, can-be the value of another object, B, in which case we say "A 
contains B." Sharing of objects arises when two or more objects contain the same object. 
Because of sharing of mutable objects, it is not sufficient that the value of a containing object 
refer to the value of the contained object; it must refer to the contained object itself, i.e., its 


identity. 


In order to treat a contained object as part of the value of the containing object, we treat 
objects as special kinds of values. We always include implicitly in every trait a trait defining 
this infinite set of objects. Therefore, any model (i.e., an algebra, A = <Vai, Fun>) of the 
theory of a trait will have the property that Obj C Va/. Treating objects as values raises a 
sticky technical issue: what is the sort of a term that denotes an object? We answer this 


question in Section 2.2.1 where we carefully define how to sort check terms. 
2.1.3 State 


Objects can be created and manipulated in the course of program execution. We model 
the state of a program at an instant in time by a state. We model CLU states as follows, where 


P(Obj) is the powerset of the set Obj. 


State = P(Obj) X Env X Store 
Env = Objld — Obj 
Store = Obj > Val 


Def: A state, 0 = <O, e, 5), is a triple consisting of a finite set of existing objects, O, which is a 
proper subset of Obj; an environment, e, which is a mapping from Obdjid to O; and a store, s, 
which is a mapping from O to Vai. 


We call Va/, the value set of o. The identifiers in Objld are CLU program variables, which 


always range over objects. Whenever we refer to "an object in o" we mean an object in a.O. 


We use 2(Va/) to denote the set of states with Va/ as their value set. Thatis, 2(Va/) = 
{<O, e, S> | s: O — Val}. We do this to avoid having four components in a state. A particular 
state, a, is an element of some set of states, 2(Va/), and thus each state is always associated 


with some fixed set of values. 


A state can change over time in three ways: the set of existing objects grows because 
new objects are added from the universe; the environment changes because the mapping 
from CLU program variables (i.e., object identifiers) to objects changes; or the store changes, 


because the values of existing objects change. 
2.1.4 Procedures and Operations 


We model a procedure as an operation, where an operation is a pair, <R, A>, consisting 
of a relation and an algebra. We refer to the relation of an operation modeling a procedure as 


the input-output behavior of the procedure. A relation, R, is a set of pairs of states: 
RC 2(Val) X X(Val) where A = <Val, Fun> 


We call the first component of a pair in the relation the input state; the second, the 
output state. Let dom(R) be the set of input states of R; ran(R) be the set of output states of R. 
The relation viewed as a set of pairs of states is more general than we need. In particular, we 


can and should be specific about the arguments passed to and from a procedure. 
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Def: The object identifiers in a procedure heading are input formals of the procedure. The 
objects the formals denote are input arguments of the procedure. The objects returned bya | 
procedure are output arguments. 


A relation, R, which is a component of an operation, has the following properties: 


1. dom(R) = {<O, e, s>|dom(e) = set of input formals A 
ran(e) = set of input arguments} 
2. ran(R) = {<O, e, s> | ran(e) = set of output arguments} 


where dom(e) is the domain of the environment e, and ran(e) is the range. The first property 
states that the environment of all input states is the set of bindings from input formals (object 
identifiers) of a procedure to the arguments passed to it. The second property states that the 
range of the environment of all output states is the set of output arguments. (CLU procedures 
do not list identifiers for output arguments. Since our specifications do, we will strengthen the 


second property when we define a model of a procedure specification.) 


The algebra A of a model of a procedure provides the set of values, Va/, over which . 
objects manipulated by the procedure can range. Vai is the same set as the value set of each 


state of the pairs in the relation. 


Procedures can terminate in more than one way. Let TermCond be a set of special 
values called termination conditions, and let terminates be a special object in the state that 
can take on a value from TermCond. For simplicity, we henceforth view that included 
implicitly in all traits is the trait defining the values in TermCond and that terminates€O for 
all states <O, e, s>. We reserve the special value normal for the normal termination condition. 
A procedure may also never terminate. For a given input state, if the set of output states is 


non-empty, then the procedure must terminate for that input state.’ 


1. In CLU, a procedure may also terminate because of an unhandled exception thereby sigraling failure. We view 
this situation as a programmer error and we choose not to provide the ability to specify such procedures. Hence, a 
procedure that signals failure satisfies no specification. 


2.1.5 Clusters and Abstract Data Types 


We model a cluster as an abstract data type, where an abstract data type isa a, = 
<Obs, Ops), consisting of a set of objects and a set of operations. The set of objects, Obs, is 
the subset of the objects of Obj whose elements are of type T. An operation in Ops is a pair 
consisting of a relation and an algebra, as previously defined. We require that all the 


operations of the type have the same algebra. 
2.1.6 Computations 


We model a computation as an alternating sequence of states and statements starting in 
some initial state, 09. Each statement, S, of a computation sequence is a partial function on 


states: 
S: Z(Val) — Z(Val 

For the states, o;, and the statements, S;, 1<i<n, let a computation sequence be: 
89 Sy 04; 1 On.1 Sy on 


and for all 1<i<n <o;.,, o> € S. We refer to the states op, ..., 6, above as "states of a 
computation sequence." We could also view a computation sequence as a sequence of 
states, and dispense with references to individual statements. However, in defining 
computational induction, which we do in Chapter 3, we need to be able to refer to the 


statements that cause the changes to states. 


We are interested in only two kinds of CLU statements: assignment and procedure 
invocation. All other statements can be defined in terms of these two. In CLU, a simple 
assignment statement can change the environment of a state by changing the mapping from 
an object identifier to an object. A procedure invocation can change the set of aiisting 


objects of a state by adding new objects to it, and it can change the store of a state by 
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changing the values of objects. All objects returned from a procedure as a result of a 
procedure invocation can be assigned to object identifiers in an assignment statement. So, 
when assignment is combined with procedure invocation, an assignment statement, in 


general, can change all components of a state. 
Properties of Computations 


1. Successive states: A property that holds between two successive states of all 


computation sequences is: 
V1<ign 9;.,.0 € g,.0. 


This property states that new objects can possibly be added to, but not removed from, a state 


as a result of a procedure invocation. 


2. Procedure invocation: For all 1<i<n, if S, is or contains the invocation of a 
procedure, Pr, the following two properties hold. Let Op = <R, A> be the operation modeling 
Pr. For ail <in, out> pairs of states in R (recall that the range of an environment is a set of 
objects): 


2.1. ran(in.e) U {Pr} € 6,.,.0 
2.2. ran(out.e) € o;.0 


The first property states that all input arguments and the procedure Pr are in the set of 
existing objects of the state before the invocation of Pr. Pr is included because a procedure is 
also an object in CLU and must exist before it is invoked. The second property states that all 


output arguments are in the set of existing objects upon the termination of Pr. 


We summarize the models we have described in Section 2.1 in Figure 6. 


Syntax Semantics 


Trait A model of a trait is a (many-sorted) algebra, 
where for an algebra A = <Val, Fun), 
Val is a set of values and Fun is a set of functions. 


Procedure A model of a procedure is an operation, 
where for an operation Op = <R, A>, 
R is an input-output relation on pairs of states (see below), 
and A is an algebra. 


_ Cluster A model of a cluster is an abstract data type, 
where for a type T = <Obs, Ops>, 
Obs is a set of objects (of type T), and Ops is a set of operations. 


Some Syntactic Domains 


Sortid set of sort identifiers 
Typeld set of type identifiers 
Obijld set of object identifiers 


Some Semantic Domains 


State = P(Obj) X Env X Store 


(Val) set of states over value domain, Val. 
Obj set of all potentially existing objects 
TermCond set of termination conditions 

Facts 


For all states, o = <O, e, s>, where o€ (Vai), 


OC Obj set of existing objects 
e: Objid-> O — an environment 
s:O — Val a store 


TermCond C€ Val 
terminates€O 
normal€TermCond 


Figure 6. Summary of Models, Syntactic and Semantic Domains 


2.2 Kernel Interface Language and Models 


We now turn to describing in detail the interface language. We have already defined the 
underlying models for traits, described the domains of objects and states, and described the 
underlying models for procedures and clusters. What remains is to present the syntax of the 
kernel language and to define the validity relationship (F=), which we do in Section 2.2.2 for 


procedure specifications and in section 2.2.3 for cluster specifications. 
Syntactic Conventions 


We use extended BNF to define the syntax of our language with the following syntactic 


conventions: 
| alternative separator 
a+ -one or more a’s 
a+, one or more a’s separated by commas 


<a an optional a 
Nonterminals are italicized. Terminal symbols include parentheses, square brackets, curly 
braces, and boldface items. Comments in specifications begin with "%" and end with a 


newline. 


In the next three sections, 2.2.1 through 2.2.3, we describe the interface assertion 
language, procedure specifications, and cluster specifications. Section 2.2.1 contains the 
basis of the assertion language for writing the bodies of procedure specifications. Section 
2.2.2 on procedure specifications is further broken down into five subsections describing 
various parts of the interface language that are germane to procedures. It introduces special 
assertions that are additions to the base assertion language described in Section 2.2.1. In 
Sections 2.2.2 and 2.2.3, for each part of the interface language we will present four sections: 
its syntax, its syntactic checks, its meaning, and an example. Some of the syntactic checks 
that we require would be unnecessary if we added more complexity to the grammar that we 


present. We choose not to put the complexity in the grammar in order to simplify our 


description of the meanings of the various parts of the language. 
2.2.1 Interface Assertion Language 


In this section we describe the language we use to make assertions about objects and 
their values in a state. These assertions appear in the bodies of specifications and can refer 
to both initial and final values of objects. After presenting the syntax of interface assertions, 
we present a lengthy section on the syntax checking of assertions. It is long because we 
discuss in depth the issue of sort checking a term that refers to an object. Finally, we present 
the meaning of an interface assertion by giving a truth value function. Since an assertion can 
refer to the initial and final value of an object, the truth function is defined with respect to two 


states, corresponding to the input and output states of an input-output relation. 
Syntax 


Assn ::= true | false | ~Assn | Assn Connective Assn | (Assn) 
| Quantitier Varid: Sortid Assn 
| Term = Term 
Term ::= Varld | Objid | Opid<(Term + ,)> | Termt | Terms 
Connective ::= A|V|[=|e2 
Quantifier::= V|3 


We allow parentheses to be omitted by relying on the following conventions: — 
1. Outermost parentheses may be dropped. 
E.g., "A AB" is "(A A B)." 
2. The precedence of the operators and quantifiers from highest to 
lowest is ~, V, 3, A, V, =, &. ; 
E.g., "Vx A = B" is (Vx A = B), and not "Vx (A = B); "~AAB= 
C" is "((~A) A B) = C." 


3. When one connective is used repeatedly, the expression is 
grouped to the right. 
E.g., "A= B= C" is "A = (B= C),." 


We allow the use of other delimiters, such as square brackets, for parentheses. An assertion - 


of the form 7 = true is abbreviated to 7; 7 = false, ~7, where 7 is in Term. 
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Assertions in specifications can refer to both the initial and final values of objects. We 
use xt to denote the initial value and x+ to denote the final value of object x. The 


interpretation of these terms will be defined rigorously in the Meaning section. 


In order to define precisely how to sort check an assertion we need to define the 


subterms of an assertion or term: 


‘Def: The subterms of an assertion, a, in Assn are defined as follows: 

1. a is a subterm of itself. : 

2. If ais of the formt? = t2, the subterms of both t7 and {2 are subterms of a. 

3. If a is of the form ~a, the subterms of a are subterms of a. 

4. If a is of the form a7 # a2, where # is in Connective, the subterms of both a7 and 
a2 are subterms of a. 

5. If a is of the form (a), the subterms of a are subterms of a. 

6. If a is of the form Wv:S a of 3v:S a, the subterms of a are subterms of a. 


Def: The subterms of aterm, 1, in Term are defined inductively as follows: 

1. r is a subterm of itself. 

2. If + is of the form (f(t1, ..., tn)), where f is in Opid and t1, ..., tn are in Term, the 
subterms of t7, ..., tn are subterms of +. 

3. If + is of the form tt or t+, the subterms of t are subterms of r. 


Checking 


We check that all assertioris sort check, where all trivial subterms, i.e., terms that are in 
either Varid or Objid, sort check. The second definition below relies on understanding the 
discussion, Sorts for Objects and Values; we present it here to keep the definitions involving 


the syntax checking of an assertion together. 


Def: An assertion, a, sort checks: 
1. lf wis of the formt? = t2, the sorts of both t7 and {2 are the same. 
2. All subterms of a sort check. 


Def: A term, 7, sort checks if and only if: 
1. All subterms of r sort check. 
2. If 7 is of the form g(s1, ..., sm), where g is in Opid and s1, ..., sm are in Term, the 
domain of g must be a sequence of the sorts of the m terms in s7, ..., sm where 
a. The sort of a term of the form f(t7, ..., tn), is the range of f, where f is in 
Opld and t1, ...,tn are in Term, 
b. The sort of a term of the form v is S, where v is in Varid-and is bound in an 
assertion of the form Vv:S a or 3v:S a, for a in Assn, 
c. The sort of a term of the form o is the sort T_obj where o is in Objid and T 
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is the type of the object denoted by o, and 
d. The sort of a term of the form tt or t¥ is the sort TtoS(T) where ¢ is in Term 
and T is the type of the object denoted by t. 
3. If 7 is of the form ¢t or ty, t must denote an object, where tis in Term. 


Sorts for Objects and Values 


We now address the sticky technical issue raised earlier in Section 2.1.2 where we 
discussed objects: if an object is a value, what is the sort of a term denoting such a value? 
Before we answer this, let us look at an example. Let the value of some array (of sets) object 
be denoted by the sari addh(addh(create(1),s1),s2), where the signatures of addh and 
create are (addh and create are trait function identifiers): 

create: Int > A 
addh: A,?> A 
What sort is "?"? The object identifiers s1 and s2 denote objects since the value of an array 


object refers to the set objects the array contains, not just the values of the set objects. 


We introduce a special subset of Sortid called ObjSortid. For each different type in the 
set, Obj, there is a sort identifier in ObjSortid. Each sort identifier in ObjSortid is called an obj 
sort; each in Sortid is called a value sort. (Just as an object is a special kind of value, an obj 


sort is a special kind of value sort.) So, in our array example, s7 and s2 are of some obj sort. 


Therefore, an object has two sorts associated with it: its obj sort and its value sort. The 
sort of a term denoting the value of an object is a value sort--it can be an obj sort since objects 
can contain other objects. The sort of a term denoting the object itself must be an obj sort. 
There is a one-to-one correspondence between the type of an object and its obj sort. We use 
the naming convention that T_obj is the name of the obj sort for objects of type T. In our array 
value example, s7 and s2 are of the obj sort, set_obj. There is a one-to-one correspondence 
between the type of an object and the sort of a term denoting its value. The function, TtoS, 


gives us this mapping from type names to (value) sort names. (TtoS can be many-to-one 


because more than one type can be defined with respect to the same sort.) In our array 


example, the term addh(addh(create(1),s1),s2) is of (value) sort, A. 


We emphasize that the reason we introduce an obj sort of the form "T_obj" instead of 
simply using the type identifier "T" is to keep the set of sort identifiers disjoint from the set of 
type identifiers. We do this to be consistent with the facts that the set of values, Vai, is 
partitioned by sorts and the set of objects, Obj, is partitioned by types. We also emphasize 
that the only reason we need to introduce obj sorts for objects is that objects are treated as 
values (because of sharing and mutability); for sort checking to work, we need to be able to 
refer sensibly to "the sort of an object," or more precisely, "the sort of a term denoting an 
object." 


_ Def: A term denotes an object if and only if the sort of the term is some obj sort. 


Figure 7 summarizes the various sets of identifiers for objects, values, obj sorts, value 
sorts, and types; some facts relating these sets; and some questions that are reasonable to. 


ask of objects and values, and their answers. 
Returning to the array example, the signature of the addh function is: 
addh: A, set_obj — A 
Suppose we also have a fetch function for arrays with the following signature: 
fetch: A, int — set_obj 
with TtoS defined as follows: 


TtoS(array[set]) = A 
TtoS(set) = S 
TtoS(integer) = Int 


Syntactic Domains 

Varld variable identifiers denoting values, some of which may be objects 
Objid object identifiers denoting objects, which are special kinds of values 
Sortid value sort identifiers 

ObjSortid obj sort identifiers, each of the form T_obj, for type identifier T 
Typeld type identifiers 

Facts 


Varld NM Objid = B 

_ SortldN Typeld = @ 

ObjSortid G Sortid 

|Typeld| = |ObjSortid|, where "|X|" is the cardinality of set X. 
J bijection: Typeld <> ObjSortid 

VT€Typeld 3S€Sortid TtoS(T) = S 


Questions : Answers 


For an object, x, of type T: 


What is the type of x? T 

What is the value of x in a state, 0 = <O, e, $>? 0.8(x). 
What is the obj sort of object x? TLobj 
What is the vaiue sort of the value. of x? TtoS(T) 


Figure 7. Sorts and Types, Objects and Values 


For an array[set] object, a, let at be the value of a, and for an integer object, i, let it be the 


value of i: 


The type of a is array[set]. 
The obj sort of a is array[set]_obj. 
The (value) sort of the value of ais A. 


The type of the object denoted by fetch(at,it) is set. 
The obj sort of fetch(at,it) is set_obj. 
The (value) sort of fetch(at,it)t is S. 


Suppose instead that addh and fetch were declared as: 


addh: A,S—>A 
fetch: A, Int S 


In this case, it would not make sense to ask for the type of fetch(at,it) since fetch(at,it) does 


not denote an object. It does make sense to ask for the sort of fetch(at,it); the sort is S. 
An Important Shorthand 


It is important to realize that we can quantify over objects because we are treating 
objects as values. It makes sense to write an assertion Wx:T_obj a or 3x:T_obj a, where x 
ranges over objects of type T and a is in Assn. In our examples, we abbreviate these to the 


forms Vx:T a and 3x:T a. 
Meaning 


Assertions are well-formed formulae in first-order predicate calculus with equality, 
where equality is denoted by the symbol, =. We will define the truth of an assertion with 
respect to two states, an algebra, and a variable-to-value mapping. Before we define the truth 


function, T, we explain why we need these various pieces of information. 


As mentioned in the beginning of Section 2.2.1, we need to interpret interface assertions 
with respect to two states because assertions in specifications can refer to both the initial and 
final values of objects. The two states correspond to the input state and the output state in a 


relation of an operation. 


A model of a procedure specification is an operation that includes the same algebra 
used to interpret an interface assertion. The algebra provides a set of values, Va/, and a set of 


functions, Fun, to which we refer below. 


Finally, in order to handle the free variables in an assertion, we include a 
variable-to-value mapping. This is a standard "trick" used to keep track of the variable 
identifiers that are introduced in quantified assertions. (The following definition is adapted 


from [deBakker80].) 


Def: Let VarMap be the set of functions, p: Varid — Val (the same Vai as for the algebra 
discussed above). For all »€VarMap, v€Varid, x€ Val, we write "p[x/v]" (read "substitute x 
for vin »") for the element of VarMap that satisfies, for each y€ Varia: 


1. p[x/v]{y) = x, ify =v 
2. p[x/v]{y) = ply), ify #v 


We are now ready to give the truth function, T. 
T: Assn X (Val) X Z(Val) X Alg X VarMap — {TRUE, FALSE}. 


We write "T[P](a, o’, A, »)" for the truth of an assertion P in states, o, o’; algebra, A; and 
variable-to-value mapping, p». The states o and o’ are elements of 2(Va/), where Vai is the 


same set Va! as for the algebra A. For all a, a1, a2 € Assn, and t1, t2 € Term, 


T{true}(o, 0’, A, p) = TRUE 
T{false](o, 0’, A, #) = FALSE 
T[~a]{(o, 0’, A, p) = ~T[a](o, 0’, A, p) 
Tal #a2](o, o', A, p) = Tlat](o, 0’, A, ») # TMa2](o, 0’, A, p), 
where # is in Connective. 
Ti(a)]{o, a’, A, Bt) = Talo, o’,A, #) 
TLVv:S a](o, 0’, A, p) = ¥x:S Tla](o, 0’, A, p[x/v]), 
where x is of sort S and does not appear free in a. 
TLAv:S a}(o, 0’, A, p) = 3x:S Tal(o, a’, A, p[x/v]), 
where x is of sort S and does not appear free in a. 
T[t1 = t2](o, 0’, A, p) = TRUE, if V{ti](o, 0’, A, pw) = V{t2](o, 0’, A, 2); 
FALSE, otherwise; 
where "=" between values is the equality relation on values in algebra, A. 
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The value of a term is defined by the following function, 
V: Term X Z(Val) X (Val) * Alg X VarMap — Val. 
For all y€ Varid, x€Objid, f€Opid, and t, tl, ..., tn € Term, 


Viyl(c, 0’, A, ») = ply) 

V[x](c, o’, A, p) = x, where x is neither an input nor output formal 

Vix](o, 0’, A, p) = o.e(x), where x isan input formal _ 

V[x](o, 0’, A, p) = o’.e(x), where x is an output formal 

Vif(tt, ..., tn)](o, o’, A, w) = fI(V{t1}(0, o’, A, 2), ..., V[tn](o, 0’, A, 2)) 
where f! is the function €A.Fun denoted by f. 


Vitt}(o, 0’, A, z) = o.s(V{t}(o, 0’, A, p)) 
Vitt](o, 0’, A, p) = o'.s(V{[t](o, 0’, A, )) 


Example 


As an example, let us apply the value function, V, to the term, fetch(at,it), where a and i 
are input formals of a procedure specification. 
Vifetch(at,it)](o, 0’, A, p) 
= fetch!(V[at](o, 0’, A, 2), Vfit](a, 0’, A, w)) 
= fetchi(o.s(V[al(o, o’, A, n)), o.s(Vfi](o, 0’, A, #))) 


= fetch!(o.s(o.e(a)), 0.s(c.e(i))) 


Here, fetch! is a function in A.Fun; o.s(c.e(a)) and o.s(a.e(i)) are values in A. Val. 
2.2.2 Procedure Specifications 


A procedure specification specifies a subset of the set of all the possible operations that 
are models of procedures. In this section, we define when an operation is a model of a 


procedure specification. 


In the next five subsections we will describe the language and the validity relation for 
procedure specifications. First we consider procedure specifications ignoring exceptional 
termination; second, we consider those with exceptional termination. In the subsequent three 


_sections, we describe special assertions to handle the creation of new objects, the mutation 


of existing objects, and procedure objects. 
2.2.2.1 Procedure Specifications Without Signals 


A procedure specification includes a name, a heading, a link, and a body. The heading 
specifies the types of the input and output arguments. The link identifies the name of the trait 
that defines an algebra that provides the values over which the input and output arguments 
can range. The body is a pair of assertions that specify conditions relating the initial and final 


values of the input and output arguments. 
Syntax 


ProcSpec ::= Procid = ProcHead Link ProcBody end 
ProcHead ::= proc Args <Rets> 

Link : = uses Traitld 

ProcBody ::= PreC PostC 

PreC ::= pre Assn 

PostC ::= post Assn 


Args ::= (“Decl +,>) 
Rets ::= returns (Dec! + ,) 
Decl ::= Objid +,: TypeSpec 


TypeSpec ::= Typeld © 
Some definitions: 


Def: The object identifiers in a procedure heading are formals of the procedure specification. 
The objects the formals denote are arguments. 


Def: Object identifiers in an Args are called input formals, and their objects, input arguments; 
object identifiers in a Rets are called output formals and their objects, output arguments. 


Def: The trait named in a procedure specification, pr, is called the used trait of pr. 


Checking 
For a procedure specification to be syntactically well-formed, we check that: 


1. Each object identifier appearing in a pre-condition or 
post-condition appears in the list of formals. The sets of input — 
formals and output formals are disjoint. 


2. The assertions appearing in the pre- and post-conditions sort 
check according to the function declarations of the used trait. 


3. Output formals appear only in a post-condition. 


4. Terms of the form ++, where r€Term, appear only in the 
post-condition. 


The header of a procedure specification is the same as that for a CLU procedure except that 


. identifiers are introduced in the returns clause for output arguments. 
Meaning 


Informally, the pre-condition of a procedure specification defines a subset of the 
universe of states over which the procedure must terminate. The procedure specification 
does not say anything about those states which do not satisfy the pre-condition. The 


post-condition defines for any valid initial state the final states that are acceptable. 


Formally, a model of a procedure specification, Pr, is an operation. An operation is a 
pair, <R, A>, where R is a relation on pairs of states, and A, is an algebra. Each relation, R, of 
an operation has the following properties (compare with Section 2.1.4): 

1. dom(R) = {<O, e, s>|dom/(e) = set of input formals A 
ran{e) = set of input arguments} 


2. ran(R) = {<O, e, s> | dom(e) = set of output formals A 
ran(e) = set of output arguments} 


The first property states that the environment of all input states is the set of bindings from 
input formals (object identifiers) of a procedure specification to input. arguments (objects). 


The second property states that the range of the environment of all output states is the set of 
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bindings from output formals (object identifiers) to output arguments (objects). 


We now define when an operation is a model of a procedure specification, Pr. Let Pr 


have a pre-condition P, post-condition Q, and used trait Tr. 


Def: For an operation, Op = <R, A>, Op is a model of Pr, i.e., Op F Pr, if and only if: 
1. Ais a model of Tr, and 
2. <R, A> F <P, Q> (defined below). 


_ Def: Let A = <Val, Fun>. <R, A> & <P, Q> if and only if: 


Vp:Varld > Val 
Vo T{P](o, p, A, ») = [Jo’ <o, o DER A Vo'[<o, o DER = TQ](o, 0’, A, w)])} 


This says that for all variable-to-value mappings (needed to handle free variables that appear 
in assertions), for all states in which the pre-condition is satisfied, there exists some output 
state in the relation (this gives us termination) and for all such output states (reached from an 
input state in which the pre-condition is satisfied), the post-condition is satisfied. In the above 
predicate, we define p to be some constant state (e.g., the null state) because although all 
assertions are interpreted with respect to two states, it makes sense to refer to only initial 
values of objects in a pre-condition. By the syntactic restrictions we place on what assertions 
may appear in pre-conditions, the evaluation of an assertion in a pre-condition can ignore the 


second state. 
Example 


choose = proc (s: set) returns (i: int) 
uses SetOfint 
pre ~isEmpty(st) 
post has(st,i+) 
end 


This procedure specification specifies that the choose procedure takes in one input object of 
type set and returns one output object of type int. The pre-condition is satisfied only when the 
value of the input set object is not empty. The post-condition asserts that the value of the 


output integer object is in the value of the input set object. The function identifiers, isEmpty 
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and has, appear in the SefOfE trait, which is included in the SetOfint trait (Appendix A). 
2.2.2.2 Termination Conditions 


A CLU procedure may terminate in more than one way, depending on the input state. 
We distinguish exceptional termination from normal termination by including in the procedure 
heading all possible exceptional termination conditions of the procedure and each of their 


associated returned objects. 
Syntax 
We add to the procedure specification heading a signals clause: ° 


ProcHead ::= proc Args <Rets> <Sigs> 
Sigs ::= signals (Exception + ,) 
Exception ::= Sigid <(Deci + ,)> 


and to the assertion language: 


Assn ::= ...| returns | signals Sigid 


As with a Rets clause, object identifiers in a Sigs clause are called output formals and their 


objects, output arguments. 
Checking 
We additionally check for a well-formed procedure specification that: 


1. Each signal identifier appearing in some signals assertion in the 
post-condition appears in the headirig. 


2. signals and returns assertions appear only in the post-condition. 
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Meaning 


Recall that a special terminates object is included as part of the set of existing objects 
of all states. Upon normal termination of the procedure, the value of terminates is equal to 
normal; upon exceptional termination, the value of terminates is equal to the Sig/d in some 
signals assertion. Formally, we extend the truth function, T, such that for all x€Sig/d: 

T[returns](o, o’, A, 2) = o’.s(terminates) = normal 
T[signals x](c, o’, A, 2) = o’.s(terminates) = x 


The set, TermCond, is the union of Sig/d and {normal}. 
Example 


choose = proc (s1: set) returns (i: int) signals (emptySet(s2: set)) 
uses SetOfint 
pre true 
post [~isEmpty(sit) = has(s1t,it) A returns] A 
[isEmpty(s1t) = signals emptySet A s2 = s1] 
end 


When choose terminates normally, terminates} = normal and returns an int object; when it 


terminates exceptionally, terminates) = emptySet and returns a set object. 
2.2.2.3 New Objects 


Procedures can create new objects. When a new object is created, the set of existing 
objects, O, of the input state is extended by adding an element from the universe to O that was 


previously not in O. 
Syntax 


Assn: = ...|new @ | new Term+, 


Checking 


A new assertion can appear only in a post-condition. Let a be an assertion of the form 
new (1, ..., in, where t7, ..., tn are in Term. Subterms of a are the subterms of each term in the 


list t7, ..., tn. We check that for the assertion a: 


1. Each subterm of each term listed in t7, ..., in sort checks. 


2. Each term listed in t7, ..., tn denotes an object. 


Meaning 


Recall that a state has three components, one of which is the set of existing objects, O. 
We extend the truth function, T, such that for all terms t1, ..., tn in Term: 
T[new @](o, 0’, A, w) = ¢.0 = o’.0. 
T[new tt, ..., tn](o, 0’, A, w) = (o.0N {t1,..., tn} = B) A(o’.O = o OU {tt, ..., tn}). 
Example 
create = proc() returns (s: set) 
uses SetOfint 
pre true 


postst = empty A news A returns 
end 


This procedure specification specifies that the create procedure when invoked returns a new, 
initially empty set object. The previous examples can be strengthened by adding a new @ 


assertion to their post-conditions. 
2.2.2.4 Mutation 


A procedure can mutate objects as well as return them. We add an assertion that 
specifies that no objects are allowed to be mutated and an assertion that specifies what 


objects a procedure is allowed to mutate. 


Syntax 
Assn::= ...| mutates 2 | mutates Term+, 
Checking 


A mutates assertion can appear only in a post-condition. Let a be an assertion of the 
form mutates 17, ..., tn, where t7, ..., tn are in Term. Subterms of a are the subterms of each 


term in the list t7, ..., ta. We check that for the assertion a: 


1. Each subterm of each term in the list t7, ..., tn sort checks. 


2. Each term in the list t7, ..., tn denotes an object. 
Meaning 
We extend the truth function Tas follows: 


T[mutates D](o, 0’, A, p) = T[Vy:T_obj (y€o.0 = y+ = yt)](o, 0’, A, p) 
T[mutates t1, ..., tn](o, 0’, A, w) = 
T[Vy:T_obj ((yEo.0 A ~(y = tt) A... A ~(y = tn) = (ys = yt))](o, 0’, A, w) 


_ Example 


intersect = proc (s1, s2: set) 
uses SetOfint 
pre true 
post Vi:Int [has(s24,i) = has(s1t,i) A has(s2t,i)] 
A mutates s2 A returns 
end _ 


This procedure specification specifies that intersect may change only the value of the second 
input argument. Since s? and s2 might denote the same input actual and s2 might be 
mutated, we cannot guarantee that s? is not mutated; the final value of s7 is not necessarily 
equal to its initial value. The previous examples can be strengthened by adding the mutates 


@ assertion to the post-conditions. 


2.2.2.5 Procedures as Objects 


In CLU, procedures are also considered as objects that can be passed to or returned 
from procedures. For example, an input procedure argument, arg, to a procedure, pr, can be 


applied to other input arguments of pr. 
Syntax 


The type of a procedure object is given by its procedure heading. We add to the syntax 


of the interface language: 
TypeSpec ::= ... | ProcHead 
. We add to the syntax of the assertion language: 
Assn ::= ...| Assn {Term} Assn 
We call this new kind of assertion a “procedure object assertion (poa)."2 
Checking 


Let a be a poa, P{r}Q, where P and Q are assertions and + is a term. Subterms of a are 
subterms of P, Q, and +. We check that the procedure specification, 
T 


pre P 
postQ 


is syntactically well-formed. We also check that the subterms of 7 sort-check. 


2. Poa's should not be confused with partial or total correctness assertions that dea! with procedure invocations. 
Poa's deal with procedure objects. 


Meaning 


Recall that the meaning of a procedure object is a pair consisting of a relation and an 
algebra. The meaning of a poa, i.e., an assertion that refers to a procedure object is given in 


terms of the relation of the procedure object. We extend the truth function T as follows: 
| T{P{r}Q](c, 0’, A, w) = VUir](o, 0’, A, p) <P, 

’ where E was defined in Section 2.2.2.1. 

Example 


Suppose we specify a procedure that copies the elements of an array using the 
copyElem procedure as an input argument. If we wish to place a restriction on the copyElem 
procedure object, we would write it in the pre-condition of copyArray. The ArrayOfElemObj 


trait, which uses the Array trait, is given in Figure 8. 


copyArray = proc (a1: array[elem], copyElem: proc (e1: elem) returns (e2: elem)) 
returns (a2: array[elem]) 
uses ArrayOfElemObj 

pre true{copyElem}(e1t = e24 A new e2 A mutates @ A returns) 

post new a2 A length(ait) = length(a2+) A low(ait) = low(a2+) 
A (Vj:int low(a1t)<j<high(a1t) ; 

[fetch(a1t,j) = fetch(a2s,j) A new fetch(a2¥,j)] 

A mutates @ A returns 

end 


We are not able in our specification language to specify the invocation of another 
procedure. That is, we are not able to make an assertion in the procedure specification, Pr1, 
about the application of a procedure, Pr2, to a list of arguments, ArgList, such as: 


apply(Pr2, ArgList) 


The reason is that we cannot know in which states to evaluate (i.e., apply V) the objects in 


ArgList. To specify the effect we would want, because Pr2 may have side effects, we would 


ArrayOfElemObj: trait 
includes Array with [AOE for A, elem_obj for E] 


Array: trait 

includes Integer, Elem 

introduces 
create: Int > A 
addh: A,E—> A 
remh:A—~>A 
low: A — Int 
high: A > Int 
fetch: A, Int > E 
store: A, Int, E> A 
size: A Bool ~ 

closes A over [create, addh] 

constrains [A] so that for all [i,i1,i2: Int, e,e1,e2: E, a: A] 
remh(create(i)) exempt 
remh(addh(a,e)) = a 
low(create(i)) = i 
low(addh(a,e)) = low(a) 
high(a) = low(a) + size(a) - 1 
fetch(create(i1),i2) exempt 
fetch({addh(a,e),i) = if i .eq (low(a) + size(a)) then e else fetch(a,i) 
store(create(i1),i2,e) exempt 
store(addh(a,e1),i,e2) = if i .eq (low(a) + size(a)) then addh(a,e2) 

else addh(store(a,i,e2),e1) 

size(create(i)) = 0 
size(addh(a,e)) = size(a) + 1 


Figure 8. ArrayOfElemOQbj Trait 


want to evaluate ArgList with respect to pairs of intermediate states of the invocation of Pr1, 


and not the initial and final states. 


The copyArray example illustrates this failure of expressive power in our specification 
language. We would like to be able to specify. that any implementation of copyArray must 
invoke the copyElem procedure such that the effects of executing the copyArray procedure 
include the effects of executing the copyElem procedure. We specified in copyArray's 
post-condition, what the behavior of copyArray would be as if copyElem were invoked from 


copyArray. Nowhere, however, do we actually state in the post-condition that copyElem must 


be used-.-it is as if the copyElem argument were ignored. Hence, a procedure whose behavior 
is the same = specified above, but is implemented without using the copyEiem procedure 
argument, would satisfy the procedure specification. In order to rule out such procedures, we 
would need to be able to make an assertion such as: 


Vj:Int low(a1t)<j<high(a1t) apply(copyElem, fetch(a1t,j)). 
2.2.3 Cluster Specifications 


A model of a cluster specification is an abstract data type. A cluster specification 
includes a type identifier, a list of procedure specification identifiers, a link, and a body. The 
link includes the name of a trait and a mapping from the type identifier to a sort identifier. The 


body includes a set of procedure specifications. 
Syntax 


ClusSpec ::= Typeld = cluster is Procid +, ClusLink ClusBody end 
ClusLink ::= Link ClusMap 

ClusMap ::= provides MutFlag Typeld from Sortid 

ClusBody ::= ProcSpec + 

MutFlag ::= mutable | immutable 


Def: The type identifier named by a cluster specification is called the defined type. 


Def: The trait named in the uses clause of a cluster specification, cl, is called the used trait of 
cl. 


Def: A procedure specification defined within a cluster specification is called a bound 
procedure specification. A procedure specification defined outside of all cluster 
specifications is called a free procedure specification. 


Checking 
We check that: 


1. All procedure specifications whose identifiers appear in the 
heading of a cluster specification are defined in the body of the 
cluster specification, and all identifiers of procedure specifications in 
the body of the cluster specification appear in the heading. 


- 59 - 


2. The type identifier found in the type-to-sort mapping is the same 
as the type identifier that names the cluster specification. 


3. The sort identifier in the type-to-sort mapping is the name of a sort 
provided by the used trait. 


4. If the "flag" (in MutFlag) is mutable, some mutates f/, ..., tn 
assertion must appear in a procedure specification in the cluster 
specification where the defined type of the cluster specification is 
the type of the object denoted by some term in t1,...., tn. If the "flag" 
is immutable, none-of the objects denoted by terms in mutates 
assertions in any of the procedure specifications can be of the 
defined type. 


5. Each procedure specification is well-formed. 


Meaning 


A model of a cluster specification is an abstract data type, which consists of a pair of a 
set of objects and a set of operations. Let Cl be a cluster specification; Prs, the set of 
procedure specifications of Cl; Tr, the used trait of Cl. 

Def: For an abstract data type, T = <Obs, Ops>, 7 is a model of Ci, i.e., T F Cl, if and only if: 
1. Obs = {0 | o€ Obj A the sort of 0 is T_obj}, 
2. Vpr€Prs Jop€ Ops, op & pr, 
3. Vop, = <R;, A>€Ops, A = Aj, where A is a model of Tr. 
The type-to-sort mapping of the form, "provides (...) T from S," of the cluster specification 


tells us that the value of TtoS for type T is S. 
Example 


The set cluster specification (Figure 9) defines a mutable set abstract data type. 
Singleton and union return new nonempty set objects. Delete might mutate its input set 
argument, if doing so does not empty it; otherwise, it terminates exceptionally, signaling 
emptiesSet. From the theory (Chapter 3) associated with this cluster specification, we can 


show that no set object can be empty. Size returns the cardinality of its input set argument. 


set = cluster is singleton, union, delete, size 
uses SetOflint 
provides mutable set from Si 


singleton = proc (i: int) returns (s: set) 
uses SetOfint 
pre true 
post st = add(empty, it) A news A mutates 2 A returns 
end 


union = proc (s1, s2: set) returns (s3: set) 
uses SetOfint , 
pre true 
post Vi:int [has(s3¥,i) = has(sit,i) V has(s2t,i)] 
A new s3 A mutates @ A returns 
end 


delete = proc (s: set, i: int) signals (emptiesSet) 
uses SetOfint 
pre true 
post [((card(st) > 2) V ~has(st,it)) = 
(st = remove(st,it) A mutatess A returns)] A 
[((card(st) .eq 1) A has(st,it)) = 
mutates @ A signals emptiesSet] A 
new 
end 


size = proc (s: set) returns (i: int) 
uses SetOfint 
pre true 
post it = card(st) A new 2 A mutates 2 A returns 
end 
end 


Figure 9. Set Cluster Specification (SetClusSpec) 


The set cluster specification example illustrates a clear distinction between a (value) sort 
identifier and a type identifier. Although the trait SetOfint defines an "empty" value of sort S/, 
no object of set type will ever have such a value since operations on objects of set type 
construct only nonempty set objects. One could have specified a more conventional set type 


with operations create and insert, so that a possible value for a set object would be "empty. = 


We will be returning to this somewhat contrived example in later chapters. We 
henceforth refer to the specification of Figure 9 as SetCiusSpec and repeat it in Appendix | for 


future reference. 
2.3 Summary 


In this chapter we described models of specifications and implementations, and we 
described a kernel interface language. Models of traits are many-sorted a/gebras; models of 
procedures and procedure specifications are operations, each of which is a pair consisting of 
a relation on states, and an algebra; models of clusters and cluster specifications are abstract 


data types, each of which is a pair consisting of a set of objects and a set of operations. 


The kernel interface language contains procedure specifications and cluster 


specifications. Interface assertions constitute the body of a procedure specification; 
procedure specifications constitute the body of a cluster specification. The language of 
interface assertions is built from the language of Larch assertions. We added notation (t and 
4) to be able to refer to the initial and final values of objects, since interface assertions are 
lit@rprated with respect to two states. A procedure specification basically consists of a used 
trait and a pair of assertions. We introduced special assertions to handle multiple termination 
conditions, creation of new objects, mutation of existing objects, and procedure objects as 
arguments. A cluster specification basically consists of a type name, a used trait, a 
type-to-sort mapping, and a set of procedure specifications. In the next chapter we see how 


to map a specification into the set of well-formed formulae of the theory it denotes. 
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3. Theories 


In this chapter we switch to the syntactic viewpoint of specifications and 
implementations. The two main objectives of this chapter are (1) to define when an 
implementation satisfies a specification, and (2) to define precisely the theories denoted by 


specifications and implementations. 


Section 3.1 contains some definitions dealing with first-order theories. From these basic 
definitions, in Section 3.2 we define the satisfaction relation between implementations and 
specifications. Section 3.3 and 3.4 define the theory of a specification and the theory of an 
implementation, respectively. Their definitions depend on the definition of a type induction 
principle, which we defer defining to Section 3.5. Section 3.5 builds up to defining this 


principle, which is complicated because of the possibility of "exposing the rep" in CLU. 
3.1 Definitions 


The following definitions dealing with theories and formal systems are provided as a 
review of basic concepts in logic. We borrow from three introductory logic texts 


[Shoenfield67, Mendelson64, Enderton72]. 
Theory and Formal System 
A theory is specified by giving a formal system, which has three parts: 


1. Its ‘anguage. To specify a language, we specify its set of symbols, 
and its set of well-formed formulae (wff’s). We denote the language 
of a formal system F by L(F). 


2. Its axioms. Each axiom must be a well-formed formula of the 
language of the formal system. 


3. Its rules of inference, which we sometimes call rules. Each rule of 
inference states that under certain conditions, one formula, called 
the conclusion of the rule, can be inferred from certain other 
formulae, called the hypotheses of the rule. Each rule is an 


inference relation among wff's. 


A proof in F is a finite sequence of wff's, each of which is either an axiom or is the 
conclusion of a rule whose hypotheses precede that wif in the proof. A theorem of F is a wit, 
A, such that there is a proof whose last wif is A. Such a proof is called a proof of A. The 
theory specified by a formal system F is the smallest set of formulae reflexively and transitively 


closed over the set of axioms under the rules of F. 


The /ogical symbols.of a first-order language are the usual connectives, quantifiers, and 
possibly an equality symbol, =. All other symbols, e.g., function symbols,’ are calied 
nonlogical. A first-order language L’ is an extension of the first-order language L if every 
nonlogical symbol of L is a nonlogical symbol of L’. Let F and F’ denote formal systems that 
respectively specify the first-order theories T and T’. T’ is an extension of T if L(F’) is an 
extension of L(F) and every theorem of T is a theorem of T’. A conservative extension of T is 
an extension T’ of T such that every formula of F which is a theorem of T’ is also a theorem of 


T. 
Used and Imported Types 
The following definitions are based on the interface language. 


A used type of a procedure specification is a type whose identifier appears in its 
heading. The type of any object that is an input or an output argument of that procedure is a 
used type. A used type of a cluster specification is a used type of each of its procedure 


specifications. 


For a used type, T, the sort, TtoS(T), is called the used sort. For a rep type, T, the sort, 
TtoS(T), is called the rep sort. For an abstract type, T, the sort, TtoS(T), is called the abstract 


sort. 


Recall from Chapter 2, a bound procedure specification is a procedure specification that 
is defined within a cluster specification. A free procedure specification is a procedure 


specification that is defined outside all cluster specifications. 


An imported type of a cluster specification is a used type of a cluster specification that is 
not the defined type. An imported type of a bound procedure specification is a used type of 
the procedure specification that is not the defined type of the cluster specification. So that we 
can use the same terminology for free and bound procedure specifications, we define an 


imported type of a free procedure specification as a used type of the procedure specification. 
Syntactic Conventions 


For a predicate, P, of n arguments, we write P[X] to denote P(x1, ..., xn). For a predicate 
P of 1 argument, and a list, X = x1, ..., xn, we write AP(X) to denote P(x1) A ... A P(xn). For 
two lists of equal length, X = x1, ...,xn, and A = al,..., an, we write X = Aforxt = alA..A 
xn = an. We write "Pr.pre" and "Pr.post" to denote the pre-condition and the post-condition 


of the procedure specification Pr. 
3.2 Satisfaction 


We define satisfaction of an implementation with respect to a specification in terms of 
theories so we need not directly refer to states. This point of view of couching definitions in 
terms of theories will lead to subsequent definitions of properties of specifications given in 
Chapter 5. We choose to use the term "satisfaction" instead of “correctness” because it 
better suggests that a relation exists between an implementation and a specification, and 


because in terms of theories, the notion of a "correct" theory seems strange. 
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Def: A procedure, Proclmp, satisfies the procedure specification, Pr, if and only if Th(Pr) C 
Th(Procimp). 


Def: A cluster, Cluslmp, satisfies the cluster specification, Cl, if there exists a homomorphism, 
A, from terms of the rep sort to terms of the abstract sort such that Th(Cl) C Th(Cluslmp) 
[T/R],. 

[T/R], (read "T for R under A") means that T, the identifier denoting the abstract type, is 
substituted for every occurrence of R, the identifier denoting the rep type, and A(r) is 


substituted for every occurrence of a term of rep sort denoted by r. 


We discuss how one would prove that an implementation satisfies a specification after 
we have formally defined the theories of specifications and implementation. In Section 3.4.1 


we discuss this for procedures; in 3.4.2, for clusters. 
3.3 Theory of a Specification 


We are very careful to separate the trait language from the interface language, and the 
interface language from the programming language. We must similarly be careful to 
distinguish among the theory of a trait, the theories of procedure and cluster specifications, 
and the theory of an implementation. In this section we begin with a formal definition of the 


theory of a trait and then define the theories of procedure and cluster specifications. 
3.3.1 Theory of a Trait 


Let Th(tr) denote the theory of the trait tr. Th(tr) is a conservative extension of first-order 
many-sorted predicate calculus with equality. It is an extension by the addition of the function 


identifiers of tr, the axioms of tr, and two rules of inference. The formal system is as follows: 


Symbols 
Logical symbols: ~, A, V, =, =, W, 3, =; the set of variable identifiers, Varid; true, false; 
Nonlogical symbols: the set of function identifiers, Opid; the punctuation marks: comma, 


colon, and parentheses. 


Wit’s 
Wf :: = Assn 


Assn ::= true| false | ~Assn| Assn A Assn | Assn V Assn 
| Assn = Assn | Assn «= Assn | (Assn) 
| V Varld: Sortid Assn | 3 Varid: Sortid Assn 
|Term = Term 


Term ::= Varid | Opid<(Term + ,)> 


The precedence of the operators and quantifiers from highest to lowest is ~, V, 3, A, V, =, 


«. When one connective is used repeatedly, the expression is grouped to the right. 


Axioms 


1. All logical axioms of first-order predicate calculus with equality. 


a. All propositional axioms, E.g., ~P V P. 
b. Substitution axiom: Vx:S (P) = (P[t/x]), where term t is substitutable for variable 
identifier x in P (defined precisely below), and t and x are of sort S. 
c. Identity axiom: t = t. 
d. Equality axiom: s1 = t1 A... Asn = tn = f(s1,..., sn) = f(t1, ..., tn). 
2. All equations of the form t1 = t2 in tr. 
= false). All other inequations in Th(tr) are derivable from this one and the 


3. ~(true 


meaning of =. 


Rules of Inference 
1. Rules for first-order predicate calculus with equality: 


a. Modus ponens 


b. Generalization 


Here Wx:S stands for universal quantification over all sorted variables x; in P with 


corresponding sorts S.. 


2. Sort Induction 


lf "closes S over [op1, ..., opn]" appears in tr, the following is the corresponding __ 


sort induction rule for predicate P(t) with free variable t of sort S. 


P(x,) J ou. A P(X cq) = P(OPT (Ky, +s %e4)) 


P(x,) A... A P(X,,) = P(opn(x,,.., Xkn)) 
Vt:S P(t) 


where ki is the arity of opi, P(x;) = true if x; is not of sort S. 


3. Sort Reduction? 
if "reduces S over [op], ..., opn]" appears in tr, the following is the corresponding 


sort reduction rule. 
op1 (x,, eney Xa tl, anny X,) = op1(x,, asey j-4 t2, aney X,) 


OPM(X4, 0-5 X45 EV, ey Xp) = OPM(Xy, 00, Xi gy 12, 0s Xp) 
ti = t2 


3. Although in Chapter 1 we did not discuss sort reduction because we do not need it for our example traits, we 
include it here for completeness. 
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where t1 and t2 are terms of sort S, and the x;’s do not occur in t1 or t2, and the ti’s appear in 


all argument positions of sort S. 
Substitution 


In the substitution axiom we used the phrase "a term that is substitutable for a variable 
in a predicate," which we now define. 


Def: An occurrence of x in a formula P is bound if it occurs in a part of P of the form Vx:S Assn 
or 4x:S Assn; otherwise, it is free in P. 


Def: A term, 1, is substitutable for x in P if for each variable identifier y occurring in 7, no part 
of P of the form "Vy:S B" or "Ay:S B" contains an occurrence of x that is free in B. 

We write "P[7/x]" (read “substitute 7 for x in P") to denote the formula P obtained from 
the substitution of 7 for free occurrences of x in P, restricted to the cases where 1 is 
substitutable for x in P. We extend this notation for lists (of equal length) of terms and 
identifiers, A and X, so that P[A/X] stands for the formula obtained from P by respectively 
replacing all occurrences of x1, ..., xn by ierms ai, ..., an, where each term ai is substitutable 


for xiin P. 
3.3.2 Theory of a Procedu re Specification 


Let Th(Pr) denote the theory of the procedure specification Pr. Th(Pr) is a conservative 
extension of the theory of the used trait of Pr. We extend the theory of the used trait of Pr by 


adding to the formal system: 


Symbols 
The identifier, Pr; terminal symbols of Assn’s; the set of object identifiers, Obj/d; curly 


braces, t and 4. 


Wit's 
Wf :: = Assn | Assn {Procid} Assn 
Assn ::= % as in Section 3.3.1 
| returns | signals Sigld 
|new @|new Term+, 
| mutates 2 | mutates Term+, 
| Assn {Term} Assn 
Term ::= % asin Section 3.3.1 


| Objid | Termt | Terms 


Axiom 
Pr.pre[X] {Pr} Pr.post[X, Y] 


where X is the list of input formals of Pr; Y, the list of output formals. 


Rules of Inference 


1. Rule of Consequence 


P = P1, Pi{Pr}Qi, Ql => 
Pi{Pr}Q 


where P, P1, Q, and Q1 are assertions. Recall that the validity of the assertions of the 
hypotheses of this rule is with respect to two states. In particular, Q1 can refer to initial values 


of objects referred to in P1. 
2. Simplified Invocation Rule _ 
X = AAY =B, Pr.pre[X] {Pr} Pr.post[X 
Pr.pre[A/X] {Pr} Pr.post[A/X, B 


X is the list of input formals of Pr; Y, the list of output formals; A is the list of terms denoting 


objects that are input arguments; B, the list of output arguments. This is a simplified case of 
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the CLU procedure invocation rule (see [Schaffert81]}).* 


3. All type induction rules of each imported type. We define this set of type induction rules 


in Section 3.5.2. 


Th(Pr) contains the theories of all of Pr’s imported types. We intentionally excluded the 
defined type from the set of imported types of a bound procedure specification so that its 
theory would not include the theory of its defined type. This is done to avoid a circular 


definition of the theory of a cluster specification (Section 3.3.3). 
Example 
Recall the choose procedure specification: 


choose = proc (s: set) returns (i: int) 
uses SetOfint 
pre ~isEmpty(st) 
post has(st,it) A new @ A mutates @ A returns 
end 


Th(choose) includes the trait theory, Th(SefOfint), which contains some axioms, e.g., 
isEmpty(empty) = true, and Vx:SI e:E [isEmpty(add(x,e)) = false]; and the sort induction rule 
with the hypotheses P(empty) and P(x) = P(add{(x,e)), and the conclusion Vt:SI P(t). An 
example theorem that is derivable from the axioms and the rules in Th(SetOfint) is VtS 
card(s) > 0. Since the /nteger trait is imported in the SetOfint trait, Th(choose) includes all 


theorems on terms of /nt sort. 


An additional theorem in Th(choose) is ~isEmpty(st){choose}(has(st,i+) A new @ A 
mutates @ A returns). Given the simplified invocation rule, and the rule of consequence, 


we derive theorems from this axiom. For example, the formula 


4. We do not need the part of the rule that handles recursive invocations. 
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~isEmpty(add(empty,1)) 
{choose} 
has(add(empty,1),1) A new @ A mutates @ A returns 


is in Th{choose). 
3.3.3 Theory of a Cluster Specification 


Let Th(Cl) denote the theory of the cluster specification Cl. Th(Cl) is the union of the 


theories of its procedure specifications closed under the following: 


Rules of Inference 


1. All type induction rules of the defined type, T. See Section 3.5.2. 


Sometimes it is useful to include the theory of the defined type of the cluster 
specification with the theory of a bound procedure specification. We denote this theory by 


"Th(Pr+)." For notational convenience, if Pr is a free procedure, let Th(Pr +) be Th(Pr). 
3.4 Theory of an Implementation 
3.4.1 Theory of a Procedure 


Let Procimp be a procedure and Th(Procimp) denote the theory of the procedure 


Procimp. The formal system that specifies Th(Procimp) is as follows: 


Symbols 
Identifiers that appear in the procedure body; keywords of CLU and Assn’s; curly braces, t 
and 4; Procimp (the name of the procedure), if the body of Proclmp contains a recursive 


invocation. 
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Wits 
Wit ::= Assn | Assn {Stmt} Assn 
Stmt ::= CLU statements or expressions in the body of Procimp 


Assn ::= % asin Section 3.3.2 


Axioms 
All valid formulae of the form Assn {Stmt} Assn; in particular, consequences of the 
simplified invocation rule for the procedure specifications that specify the behavior of the 


procedures called from within the body of the procedure, Procimp. 


Rules of inference 

1. Rule of Consequence 

2. All proof rules of CLU [Schaffert81], including those for sequential, iterative, and 
conditional statements. 


3. All type induction rules of each imported type of Proclmp. 
If Procimp is defined within a cluster we also add: 
4. All type induction rules for the rep type of the cluster. 


From the proof rules of. CLU and the rule of consequence, given the body .of a 
procedure, we derive the set of formulae involving the body of the procedure that are valid in 


all models of Procimp. These formulae comprise Th(Procilmp). 
Proving Satisfaction 


In order to show that a procedure (implementation), Proclmp, satisfies a procedure 


specification, Pr, we need to show that each theorem in Th(Pr) is in Th(Procimp). Let Pr be: 
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Pr = proc (x1, ..., xn) returns (y1, ..., ym) signals (..) 
preP 
preQ 
end 


and an implementation of Pr be: 


Procimp = proc (x1, ..., xn) returns (...) signals (..) 
BODY 
end 


Let A and B be lists of terms denoting input and output objects, and X and Y be the lists 
of input and output formals. Assume P[A/X] {Pr} Q[A/X, B/Y] is a theorem in Th(Pr). We 
must show that P[A/X] {Pr} Q[A/X, B/Y] € Th(Procimp). To show this, we use the following 


(non-recursive) procedure definition CLU proof rule, 


x1 = alA... A xn = an A P1 {BODY} Q1 


P1 {Pr} Q1 


where P1 and ©1 are assertions, ai are terms denoting objects, and the procedure’s local (not. 
own) variables must not occur free in P1 or Q1. Notice that Vi[xi = ai] = Vi[xit = ait]. Any 
local variables are freshly created on each invocation of the procedure, and are discarded 


when it returns, so P1 and Q1 must not refer to them. 


The conclusion of the procedure definition rule produces a specification of Pr. 
Typically, we must then show that (1) P[A/X] = P1, and (2) Q1 = Q[A/X, B/Y]. Then from 


the rule of consequence, we have: 


P[A/X] = P1, P1 {Pr} Q1, Q1 = QIA/X, BY 
P[A/X] {Pr} Q(A/X, B/Y]__ 


which gives us that P[A/X] {Pr} Q[A/X, B/Y] € Th(Procimp). 
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3.4.2 Theory of a Cluster 


Let Th(Clusimp) denote the theory of the cluster Clusimp. (Clusimp) is the union of the 
theories of its procedures closed under the CLU proof rules. There are no type induction 


rules associated with a cluster. 
Proving Satisfaction 


Carrying out the following steps is sufficient to show that a cluster satisfies a cluster 
specification. 
1. Define a homomorphism A that maps terms of the rep sort to terms of the abstract 
sort. 
2. Define a rep invariant on terms of the rep sort used to help prove satisfaction of 
each procedure. | 
3. For each procedure, show it satisfies its corresponding procedure specification 


under A and that the rep invariant is maintained. 


These steps are no different from those used in usual proofs of satisfaction, where A is 
called an abstraction function [Hoare72, Guttag78, Guttag80a]. For our purposes, however, 
the abstraction function is defined on (sorted) terms and not on (typed) objects. We give an 
example of a proof of satisfaction between a cluster and a cluster specification in Appendix 


1.2. 
3.5 Type induction 


In the definitions of the forma! systems that specify the theories of specifications and 
implementations, we referred to the "type induction rules" of a type. We derive each rule 
syntactically from cluster specifications. We argue that each rule is sound, however, because 
it is derivable from the computational induction rule for CLU, which we assume is sound. In 


Section 3.5.1, we define this computational induction rule. In Section 3.5.2, we define how to 


derive syntactically a set of type induction rules for a cluster specification. 
3.5.1 Computational Induction 


Recall that our model of computation is an alternating sequence of states and 
Statements starting in some initial state, og. For the states, 9;, and the statements, S,, 1<i<n, 


let a computation sequence be: 
Uf) S, 04; “ang On.4 S, Gn 


Informally, if some predicate P is true for each successive pair of states in the 
computation, then P is true of a computation. P is essentially an invariant over the 
computation sequence. We need to introduce a function, flip, on assertions because we want 
P to be true for all successive pairs of states in the computation, where the final state of one 
pair becomes the initial state of the next pair. Since assertions are interpreted with respect to 
two states, in order to use the same truth function T, which we defined in Chapter 2, we need 
to ignore one of the two states in which an invariant is interpreted. Hence, we use flip to make 


all the arrows in an-assertion point in the same direction. 
Formally, we state the computational rule as follows. For some predicate P: 


true {S,} flip(P) 
P {S5} flip(P) 


P {S,} flip(P) 


true {S} flip(P) 
for all statements S of the computation. 
flip(P) is P with all occurrences of t replaced by 4, with a restriction on the form of P to 


which flip is applicable, and a restriction on the flipping of arrows in a procedure object 


assertion (poa): 
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1. Only assertions whose value depends on a single state can appear 
inP. Specifically, no returns, signals, new, or mutates assertions 
are allowed in P. Otherwise, we could not properly ignore one of the 
two states in which an assertion is interpreted. 


2. lf P contains an assertion about a procedure object of the form 
P1{r}Q1, where P1 and Q1 are assertions and 7 is a term denoting a 
procedure object, we do not replace t by 4 in P1 or Q1. This is 
because P1 and Q1 are not interpreted with respect to the same 
state as that for P1{7}Q1.5 


We emphasize that the first restriction is only for the computational induction rule and 
not on all assertions. For example, formulae of the form P {Pr} Q where Q has returns, 
signals, new, or mutates assertions are still well-formed, as in the axiom of Th(Pr), Pr.pre 


{Pr} Pr.post. 


Henceforth, we write Pf for flip(P). Notice we must also be careful when using the usual 
Hoare proof rules for statements like sequential composition, conditional, and loops. For 


example, the sequential composition rule should be: 


P {S1} QQ {S2} Rt 
P {$1;S2} At 


Similar syntactic transformations must be performed on all other proof rules so that they can 


be applied appropriately in proofs. 
3.5.2 Type Induction Principle 


A cluster specification is ideally more than just a syntactic way of grouping together a 
set of procedure specifications. It gives us a way of localizing the specifications of the 
behaviors (input-output relations) of all operations on objects of the defined type. This 


modularization should give a means of localizing the proof of invariant properties of all 


5. Recall that the truth of such a poa is defined to be true if the value of 7, ie., some relation-algebra pair, satisfies 
the pair of assertions <P1, Q1> (Section 2.2.2.5). 
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objects of the ienned Woe: We would like to associate with a cluster specification a type 
induction rule and assert that it is a sound rule in any cluster that satisfies the cluster 
specification. This rule would allow us to infer that some property is true of all objects of type 
T by considering only a subset of the procedures that create and mutate objects of type T. In 
this section we see that defining such an induction rule is not quite so straightforward 


because of situations that arise in implementations that "expose the rep.” 


In Section 3.5.2.1 we show how to derive this desired type induction rule for a cluster 
specification and give an example of a derivation. In Section 3.5.2.2, we explain the problem 
of exposing the rep that can invalidate this type induction rule, and so in Section 3.5.2.3 we 


extend the derivation procedure to allow for some implementations that expose the rep. 
3.5.2.1 A Type Induction Rule 


We first state how to derive the type induction rule for a type T, then explain the rule, 


then justify it. 


For a procedure specification, let T1 be the sublist of its input formals that are of type T; 
T2, the sublist of output formals that are of type T. (Recall by our definitions in Chapter 2, 
formals in a signals clause are ancladed as output formals of a procedure header.) T1 and T2 
are sublists because some input and output formals may not be of type T. Let i and j be the 


lengths of the lists T1 and T2, respectively. 
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Method: Derivation of a type induction rule for predicate, P(t), with free variable t of type T. 


Hypotheses: The hypotheses are named HB, HP, and HM for basic, producing, and mutating 


constructors (to be defined), respectively. 


1. For each bc€BC(T), add an HB hypothesis of the form: 
true {bc} API(T2) 
2. For each pc€PC(T), add an HP hypothesis of the form: 
AP(T1) {pc} API(T2) 
3. For each mc€MC(T), add an HM hypothesis of the form: 
AP(T1) {mc} AP{(T1) A AP‘(T2) 

where P is restricted as for the computational induction rule (Section 3.5.1). APH(T1) can be 
conjoined to APt(T2) to the right of the braces in the first two kinds of hypotheses, but by the 


definitions of basic and producing constructors (defined below), it would be vacuously true. 


Conclusion: true {S} Vt:T Pf(t) for all statements S. ° 


(end of Method)i 


The sets, BC(T), PC(T), and MC(T), represent the sets of specifications of procedures 
that can create and mutate objects of type T. These sets are not necessarily disjoint since a 
procedure might do both. Roughly speaking, the differences among the three are whether 
any input arguments are of type T, whether any output arguments are of type T, and whether 
any objects of type T are mutated. BC(T) is the set of basic constructors of type T. A basic 
constructor of type T is a procedure specification that has no input arguments of type T; 
whose pre-condition contains no explicit assertions about objects of type T; and whose 
post-condition specifies the return of a new object of type T. For example, singleton of 
SetClusSpec (Appendix |, Figure 9) is a basic constructor of type set. PC(T) is the set of 
producing constructors of type T. A producing constructor of type T is a procedure 


specification that has both input and output formals of type T; whose post-condition specifies 
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the return of a new object of type T; and for all assertions in its post-condition of the form 
mutates thoes tn, none of the types of the objects denoted by the terms in the list t1, ..., tn is 
T. For example, union of SetCiusSpec is a producing constructor of type set. MC(T) is the set 
of mutating constructors of type T. A mutating constructor of type T is a procedure 
specification that has an assertion in its post-condition of the form mutates t1, ..., tn, and T is 
the type of the object denoted by some term in the list t1,..., tn. For example, delete of 


SetCiusSpec is a mutating constructor of type set. 


To justify the rule, consider the computational induction rule given a predicate, P(t), on 
objects of type T. We need be concerned only with invocations of procedures that create and 
manipulate objects of type T. We reduce the number of hypotheses of the computational 

_induction rule to obtain a type induction rule by retaining only those relevant hypotheses. 
Notice we have available, however, only the procedure specifications and not their 
implementations. Hence, the hypotheses we select from the computational induction rule can 


be based solely on the specification of the procedures, and not their implementations. 
Example 1 


Consider our simple example, SetCiusSpec. Following the method given, we have 
instances of each of the three kinds of hypotheses, HB, HP, and HM, to obtain the following 


type induction rule: 


true {singleton} Pf(s) 
P(s1) A P(s2) {union} Pf(s3) 


P(s) {delete} Pl(s) 
true {S} Vt:set Pl(t) 


Suppose P(t) is card(tt) > 0. The:hypotheses are: 


HB true {singleton} card(s4) > 0 
HP card(s1t) >0 A card(s2t) > 0 {union} card(s3+) > 0 
HM card(st) > 0 {delete} card(s+) > 0O 


The conclusion is true {S} Vt:set[int] card(t+) > 0 for all statements S. 


We use the axiom of the theory of the procedure specification and the rule of 
_ consequence to show the validity of each of these hypotheses. For example, to show the 


validity of HP above, we have: 


1. Assume [card(s1t) > 0 A card(s2t) > 0]. 
2. From the above assumption and the sort induction rule associated with Th(SetOfint), 
Vi:Int [has(s3¥,i) = has(s1t,i) V has(s2t,i)] = cores?) >0 
3. Th(union) contains the axiom, 
true {union} [new s3 A mutates 2 A returns 
A Vi:Int [has(s34,i) = has(s1t,i) V has(s2t,i)]. 
4. So, by the rule of consequence (union.post = 2) we have: 
HP: card(s1t) > 0 A card(s2t) > 0 {union} card(s34) > 0 


Similar reasoning is used to show the. validity of HB and HM for singleton and delete. 
Therefore, we can conclude that the size of all objects of type set is greater than zero. Notice 


that this is a very different theorem from that in Th(SetOfint), Wx:S! card{x) > 0. 
3.5.2.2 Exposing the Rep 


We have defined an object to belong to only one type. In CLU, however, this property of 
objects does not always hold ance one can write programs where an object belongs to more 
than one type, e.g., both the abstract and the rep type. CLU type checking does not prevent 
this situation from arising because it cannot detect it syntactically. Since operations of both 
types might possibly mutate such an object, the desired locality principle of a cluster can be 


violated; our single type induction rule might be invalid. 
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When some operations besides those specified in the cluster specification defining T 
can mutate objects of type T (by means other than invoking procedures of the cluster), we say 
that "the rep is exposed." There are two ways in which such a situation may arise. Both 
involve sharing of objects of mutable type.5 One way is when the rep type object and the 
abstract type object are the same object. We call this "exposing the whole rep." Any mutating 
operation of the rep type can then mutate an object of the abstract type, and vice versa. A 
simple example of this in shown in Figure 10. Exposing the whole rep can (and most of the 
time should) be avoided. In the queue example, the make procedure should copy the array 
before returning the queue to avoid exposing the rep. Since it does not, a mutating array 
operation, e.g., addh, that changes the original input array object also changes the returned 


queue object since they are the same object. 


A second way an object of type T can be mutated by an operation other than those 
specified in the cluster specification defining T is by establishing sharing with an object of 
type T1 whose value is incorporated in the value of the rep of type T. We ‘al this "exposing 
the subrep." Whether or not an implementation exposes its subrep is relative to a 
specification. For example, the read procedure in Figure 11 would be exposing the subrep if 


the specification of read were to require that the top of the input stack returned be a new 


queue = cluster is ..., make, ... 
rep = array[elem] 


make = proc (r: rep) returns (cvt) 
return(r) 
end make 


end queue 
Figure 10. Exposing the Whole Rep for Queues 


6. If we had only immutable types or if we eliminated sharing in CLU, the problem of exposing the rep would not 
exist. i 
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object. Since read returns the top of the input stack argument, without copying, then any 
changes made to that set would appear to change the value of the stack. Again, to avoid this 


sharing, a copy of the top of the sequence should be made before returning it or pushing it. 


One could argue that implementations that expose the rep (of any kind) should be 
banned. There are two reasons why such a restriction is too severe. The first is that in 


practice, one sometimes intentionally wants such sharing among objects, perhaps for 


stack = cluster is empty, grow, read, ... 
rep = sequence{set] 


empty = proc () returns (cvt) 
return (rep$new()) 
end new 


% grow will only push on the input stack a set whose size is less than 64 
grow = proc (s1: cvt, s: set) returns (cvt) 
if set$size(s) > 64 then return (s1) . 
seq: rep:= rep$new() 
for e: set in rep$elements(s1) 
seq:= rep$addh(seq, e) 
end 


return (seq) 
end grow 


read = proc (t: cvt) returns (set) signals (bounds) 
return (rep$top(t)) resignal (bounds) 
end read 


end stack 


set = cluster is ..., delete, ... 
rep = array[int] 


% delete mutates s ifiis ins 
delete = proc (s: cvt, i: int) 


end delete 
end set 


Figure 11. Exposing the Subrep for Stacks 
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efficiency reasons, ai cleverly exploits it. The second is that there is no reasonable way to 
ban such sharing, i.e., to detect it syntactically. Before we proceed with the definitions of 
these induction rules, we point out that CLU, which cannot completely enforce a restriction 
against exposing the rep type, can still be used to construct "true" abstract types. The 
programmer need only follow a programming discipline that ensures that reps are not 


exposed or that sharing of mutable objects is not abused. 
3.5.2.3 Type Induction Rule Revisited 


If we were to associate a type induction rule as thus far defined with each cluster 
specification then an implementation that exposes the rep might violate this rule and not 
necessarily satisfy the cluster specification. In deciding whether an implementation satisfies a 
specification, we could either be very restrictive and outlaw any implementations that expose 
the rep or be less demanding. We choose to be less demanding and allow for some 
implementations that expose their subrep. in doing so, we choose not to associate a single 
type induction rule with a cluster specification, but rather a set of rules. We call this set of 
rules, the type induction principle of the cluster specification. Each rule is dependent on the 
form of a predicate, P(t), which we would like to assert holds true for all objects of type T 
between all pairs of successive states in any computation. In essence, the predicate is shown 
to be an invariant for the cluster specification. Since there is one rule per predicate, one 
could take an alternative viewpoint that we are associating a set of invariants with a cluster 


specification, where each invariant is a predicate corresponding to a rule. 


Notice that hypotheses (1), (2), and (3) of the derivation method (Section 3.5.2.1) are 
independent of the form of the predicate P(t). However, an object of type T might contain 
objects of mutable type, M, and for any predicate containing a term that refers to values of 
these subobjects, the truth of the predicate depends on the behavior of all procedures that 
possibly change the values of objects of type M. We need to show that the predicate P(t) 


remains invariant for each mutating constructor of type M, and hence include a hypothesis for 


each mc€MC(M). 
Thus, we add the following rule to the derivation of a type induction rule. 
Method (continued): Derivation of a Type Induction Rule 


4. For each subterm, 7+, in P(t) that denotes an object of mutable type M (¥ T) if ris 
followed by t or 4, add a r-instance (defined below) of HM for each mc€MC(M). 
(end of Method)& 


Def: Let P(t) be a predicate with t a free variable in P. Let + be a subterm of P, and tbe a 
subterm of +, where r denotes an object of type M. A 1-instance of HM for Pr and predicate 
P(t) is: , 
xX, = t[v,/tl A... Ax, = t[v,/t] A 
[Plv,/t] A... A Ply,/t]] 
{Pr} 
[Pf[v,/t] A we A Phy At] A Pil, piAA..A Ply, emt] 


where 

1. Each v, in P[v,/t] or P'[v,/t] is a fresh variable. There is a.v, for each of Pr’s input 
and output formals x; of type M. We need these fresh variables because Pr might have more 
than one argument of type M. 

2. Pf[v,/t] is (P[v,/t])'. .e., substitute v; for t; then flip. 


Example 2 


Suppose we specify the type stack of small sets, where sets are mutable, and that the 
identities of set objects are pushed onto the stack, not just their values. Figures 12 and 137 
give the cluster specification for the type stack of small sets and for the trait it uses. The 
implementation of Figure 11 satisfies the cluster specification of Figure 12, even though the 
implementation exposes its subrep. An implementation that does not expose its rep, é.g., one 
in which the read procedure returns a copy of the top of the stack, would also satisfy the 


specification since the post-condition of the read procedure specification specifies only that 


7. These two figures with minor variations are repeated in Appendix | for future reference. 


stack = cluster is empty, grow, read 
uses StackOfSS 
provides immutable stack from SSS 


empty = proc () returns (st: stack) 
pre true 
postst) = null Anewst A mutates @ A returns 
end 


grow = proc (si: stack, s: set) returns (s2: stack) 


pre card(st) < 64 
post s24 = push(s1t, s) A new s2 A mutates 2 A returns 
end 


read = proc (t: stack) returns (s: set) 
pre ~isNull(tt) 
postss = top(tt)t A mutates @ A returns 
end 


end stack 


Figure 12. Stack Cluster Specification 


the value of the set object returned be.the same as the value of the top of the input stack 


object. 
Suppose instead we specified in read’s post-condition: 
st = top(tt)t A news A mutates 2 A returns 


i.e., that not only the value of the set object returned be the same as the value of the top of the 
stack, but also that the set object be new, then the implementation of Figure 11 would not 


satisfy the specification. 


Returning to the specification of Figure 12, for any predicate, P, involving the values of 
sets as well as the values of stacks, it would be incorrect to assume we could prove P without 
considering the cluster specification for sets--we must include hypotheses for all procedure 


specifications that mutate set objects. 


StackOfSS: trait 
includes SetOfint, 
StackOfE with [SSS for C, set[int]_obj for E] 


StackOfe: trait 

includes Integer 

introduces 
null: > C 
push: C,E—C 
top:C ~ E 
pop:C ~C 
isNull: C > Bool 
isin: C, E — Bool 
size: C — Int 

closes C over [null, push] 

constrains [C] so that for all [s: C, e: E] 
top(null) exempt 
top(push(s,e)) = e 
pop(null) exempt 
pop(push(s,e)) = s 
isNull(null) = true 
isNull(push(s,e)) = false 
isIn(null,e) = false 
isIn(push(s,e),e1) = if e .eq e1 then true else isin{(s,e1) 
size(null) = 0 
size(push(s,e)) = size(s) + 1 


Figure 13. Traits for Stacks 


Hence, our induction rule must include a hypothesis for the delete procedure 
specification of sets. For example, suppose we want to prove ~isNull{tt) = [card(top(tt)t) < 
64] for t of type stack. We have.instances of HB and HP for empty and grow as follows: 

HB true {empty} ~isNull(st)) = [card(top(st4)+) < 64] 
HP ~isNull(s1t) =» [card(top(s1t)t) < 64] {grow} 
~isNull(s2+) = [card(top(s24)+) < 64] 
We also need to add 7-instances of HM for the term, r = top(tt), since top(tt) denotes an 
object of mutable type set and top(tt) is followed by an t in P. The delete procedure 
specification is the only mutating constructor of type set so we have a top(tt)-instance of HM 


with the fresh variable, v1, substituted in for t in top(tt). 
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HM $ = top(vit) A ~isNull(vit) = [card(top(v1t)t) < 64] {delete} 
~isNull(v14) => [card(top(v14)4+) < 64] 
The conclusion of this rule is true {S} Vt:stack[set] ~isNull(t}) = card(top(t))+) < 64 for all 


statements S. We show the validity of the hypotheses of this rule in Appendix 11.1. 


If we do not include the hypotheses for mutating constructors of type set, we could 
possibly prove a statement that is not true. For example, suppose SetCiusSpec has a 
procedure that mutates its input set argument by inserting integers into it. If called, this 
procedure could possibly change the value of a set pushed on the stack and we could not 
ensure that the size of all sets in the stack would be less than 64. If we had not included the 
hypothesis for this add procedure, we could have proved a false ststemantsihak the size of 


the top of all stacks is less than 64. 
3.6 Summary 


In this chapter we gave a precise definition of when an implementation satisfies a 
specification in terms of their theories. We defined theories of specifications and 
implementations by precisely defining their formal systems. We also described in detail the 
derivation of a type induction principle associated with a cluster specification and gave 


examples of its use. 


4. Extended Interface Language for CLU 


In this chapter we describe some extensions to the kernel interface language that make 
it easier to read and write specifications, and some that make it easier to specify certain 
features particular to CLU. The design objectives in extending the kernel interface language 


were: 


1. To enhance the readability of specifications, 
2. To encourage a stylized form of writing specifications, 


3. To be applicable to interface languages for other programming languages. 


Section 4.1 presents four simple syntactic extensions. The prime motivation for 


_ introducing them is to enhance the readability of specifications. The meaning of each new . 


construct is given a translation into the kernel language. For each extension we also give any 
necessary additions to the syntax and checking of specifications. Section 4.2 discusses 
extensions to both the syntax and semantics of the interface language to handle three 


features particular to CLU: own variables, iterators, and parameterization. 
4.1 Simple Extensions 


The assertions in the pre- and post-conditions of a procedure specification tend to be 
unwieldy and long. In order to streamline the appearance of each of these assertions and to 
highlight the significant ones (e.g., mutates), we introduce the following four changes to the 
kernel language: a default used trait, a separate mutates clause, a default termination 


condition value, and multiple pre- and post-conditions. 


4.1.1 Default Used Trait 


Naming the used trait in a procedure specification becomes optional. For a free 
procedure specification, since the theory of the used trait must include the theories of each of 
the used traits of the cluster specifications that define the used types of the procedure 
specification, we can always introduce a new trait that includes (in the Larch sense) the used 
traits associated with the used types. For bound procedure specifications, if the name of the 
used trait does not explicitly appear, we define the default used trait to be the used trait of the 


cluster specification. to which the procedure specification is bound. 
Syntax 
ProcSpec ::= Procid = ProcHead <Link> ProcBody end 
Translation 
For the following free procedure specification, 


Pr = proc (...) returns (...) signals (...) 
pre P 
post Q 
end 


let {Tr,, ..., Tr,,} be the set of used traits of the used types of the input and output arguments 


to Pr. The above translates to: 


Pr = proc (...) returns (...) signals (...) 
uses Tr 
pre P 
post Q 
end 


where Tr is the trait: 


Tr: trait 
includes Tr,, ..., Tr, 


A bound procedure specification, Pr, appearing in a cluster specification, Cl, 


Cl = cluster is..., Pr, ... 
uses Tr 


Pr = proc (...) returns (...) signals (...) 


preP 
post Q. 
end 
end 
translates to: 


Cl = cluster Is..., Pr, ... 
uses Tr 


Pr = proc (...) returns (...) signals (...) 
usesTr — 
preP 
post Q 
end 


end 


4.1.2 Mutates Clause 


We highlight a procedure’s potential effect of mutation of objects by lifting from the 
post-condition a mutates assertion of the form mutates ti, ..., tn and setting it off as a 
clause on its own. If no explicit mutates clause appears, we conjoin the mutates @ 


assertion to the post-condition. 
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Syntax 
We modify the syntax to allow for a mutates clause: 


ProcBody :: = Triple 
Triple ::= PreC <Muts> PostC 
Muts ::= mutates Term+, 


Recall that a procedure object assertion is of the form "P{Pr}Q" where P and Q are 


assertions; hence the syntax must still allow mutates assertions to appear in post-conditions. 


Translation 
A triple of the form: 
preP 
post Q 


where Q has no mutates assertion, translates to: 


pre P ; . 
post Q A mutates @ 


A triple of the form: 


preP 
mutates Term+, 
postQ 


where Q has no mutates assertion,:translates to: 


pre P 
post Q A mutates Term+, 
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4.1.3 Default Termination Condition Value 


We choose normal to be the default value for the terminates object of a procedure 


specification. If no returns or signals assertion appears in a post-condition, then there is an 


implicit returns assertion in that post-condition. 
Translation 
A procedure specification of the form: 


Pr = proc (...) returns (...) signals (...) 
preP 
post Q 
end 


where © has neither a returns nor a signals assertion translates to: 


Pr = proc (...) returns (...) signals (...) 
preP 
post Q A returns 
end 


Example 


intersect = proc (s1: set, s2: set) 
pre true 
mutates s2 
post Vi:Int [has(s2¥,i) = has(s1t,i) A has(s2t,i)] 
end ; 


This specification has an implicit used trait, a separate mutates clause, and an implicit 


termination condition value (i.e., normal). The reader should compare the above intersect 


procedure specification with that in Section 2.2.2.4. 


4.1.4 Multiple Pre- and Post- Conditions 


The behavior of a procedure can often be broken down into several cases depending on 
the input state. Demarcating these individual cases enhances the readability of the 
specification and also disciplines the specifier to consider all possible cases in a stylized way. 


We introduce the use of multiple pre- and post-conditions. 
Syntax 
We modify the syntax as follows: 
ProcBody ::= Triple + 
Translation 
A procedure specification, Pr, of the form: 


Pr = proc (...) returns (...) signals (...) 
pre P14 
post Q1 


pre Pn 
post Qn 
end 


translates to: 


Pr = proc (...) returhecs) signals (...) 


pre P1V...V Pn 
post (P1 = Q1) A... A (Pn = Qn) 
end 


We do not require that the pre-conditions cover all cases nor that they be disjoint. 


Example 


absVal = proc (i: int) returns (j: int) 
preit>0 
post jy = 
pre it<0 
post j} = -it 
end 


Multiple pre- and post-conditions are most useful in distinguishing among the various 
termination conditions of a procedure and in conjunction with an implicit returns assertion. 


Typically, one pre- and post-condition pair is written for each distinct termination condition. 
Example 


choose = proc (s: set) returns (i: int) signals (isEmpty) 
pre ~isEmpty(st) 
post has(st,i+) 
pre isEmpty(st) 


post signals isEmpty 
end 


The reader should compare the above choose procedure specification with that in Section 


2.2.2.2, 
4.2 Handling Other CLU Features 


We have so far ignored the following three features of CLU: own variables, iterators, and 
parameterization. We discuss an own variable as a particular me of "memory object" in 
Section 4.2.1, and the other two feahines in the subsequent iwo sections. We add some 
extensions to CLU computation sequences and to procedure invocations to handle memory 
and iterators, and we add a semantic check for one kind of restriction on type parameters of 


parameterized specifications. 


4.2.1 Memory Objects 


A procedure’s behavior may depend on the values of objects in the input state not 
explicitly bound to the formals. We call these "memory objects." In CLU, for example, an own 
variable is an object whose value is "remembered" from invocation to invocation. In other 
programming languages, a global variable is an example of another kind of memory object 


accessible from all procedures. 


We need to specify the behavior of a procedure with memory, which we cannot do in the 
framework presented so far. Hence, we extend the syntax and semantics of procedure and 
cluster specifications. We use CLU own variables to model these extensions.® 


’ 


Specifying memory raises two problems. The first is that unlike for input and output 
formals, we need to be able to specify the possibility of changing the bindings of memory 
object identifiers. Thus far, we did not need to specify this because the effect of changing 
bindings of formals does not affect the bindings of the actuals. That is, except for ‘sith 
variables, bindings from CLU program variables to objects can be changed only through CLU 
assignment and not through procedure invocation. Hence, analogous to a mutates 
assertion for stating a possible change to the store component of a state, we introduce a 
changes assertion for stating a possible change to the environment component. One subtle 
. difference between changes and mutates is that whereas only terms denoting mutable 
objects can follow the mutates keyword, identifiers for both immutable and mutable objects 


can follow the changes keyword. 


8. As a matter of programming style, the use of own variables in CLU is discouraged because they add semantic 
complexity. Their use can always be avoided by retaining state information in a "dummy" cluster; however, own 
variables are often used to save overhead in extra procedure calls. 


The second problem deals with keeping track of whether a memory object has been 
initialized. In CLU, initialization of a procedure’s memory occurs at (possibly) the procedure’s 
first invocation. It may not occur if the initialization code within the procedure is not executed 
(e.g., because of a conditional), in which case memory is left uninitialized. Hence, we 
associate with each memory object, x, an implicit memory boolean object that is initially false 


and denoted by the identifier x$init. If x$init is false, x is uninitialized; if true, x is initialized. 
Syntax 
We modify the syntax as follows: 


ClusBody ::= <Rmbr> ProcSpec + 

ProcBody ::= <Rmbr> Quad + 

Rmbr::= remembers RemDeci + 

RemDec! ::= Objld: TypeSpec 

Quad :: = PreC <Chgs> <Muts> PostC 

Chgs ::= changes Objid+, 
The remembers clause simply allows the user to introduce object identifiers for memory. We 
emphasize that the declaration of memory objects in a specification does not imply the use of 
memory (e.g., own variables). in a corresponding implementation. As with a mutates 
assertion, we make a changes assertion a separate clause in the body of a procedure 


specification. 
We add to the syntax of the assertion language, 
Assn ::= ...| changes Objld + , 
with truth value: 


T[changes x1, ..., xn](o, 0’, A, 2) = 
Vy [~(y = x1) A... A ~(y = xn) = (o.e(y) = o’.e(y))] 
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Checking 
We check that 


1. Object identifiers. appearing in a remembers clause of a 
procedure specification, Pr, are disjoint from Pr’s input and output 
formals. 


2. Object identifiers appearing in a remembers clause of a cluster 
specification, Cl, are disjoint from the sets of input formals, output 
formals, and memory object identifiers of all of Cl’s procedure 
specifications. 


3. Only memory object identifiers can appear after the changes 
keyword. 


Meaning 


We treat ‘anions objects as implicit input and output arguments to a procedure. We 
modify the structure of an operation (a relation-aigebra pair) so that the domain and range of 
the environment components of the input wh output states of the relation includes naiieey. 
(compare with Section 2.2.2.1) and their corresponding “init" objects. Let Memid be the set 
{x | x is a memory object identifier} U {x$init | x is a memory object identifier}, and let 
MemObj be the set of objects denoted by identifiers in Memid. 


1. dom(R) = {<D, e, s>|dom(e) = set of input formals U Memid A 
ran(e) = set of input arguments U MemObj} 


2. ran(R) = {<D, e, s> | dom(e) = set of output formals U Memid A : 
ran(e) = set of output arguments U MemObj} 


The first equation states that the environment of each input state includes the bindings from 
memory object identifiers to memory objects and the bindings for the corresponding "init" 
objects as well as the set of bindings from input formals (object identifiers) to input arguments 
(objects). The second equation states a similar property for the environment of each output 


state. 
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We add the idliewitg two properties to the initial fais of a computation, go, for all 
memory objects, x, 
1. {x, x$init} C o9.0 
2. 09-8(dp-e(x$init)) = FALSE 
The first property states that all memory objects and their associated boolean "init" object are 
in the set of existing objects of the initial state. The second property states that the "init" 
_ objects are initialized to the boolean value false. Notice that since x$init denotes an 


immutable boolean object, it makes sense to change x$init, but not to mutate it. 
Example 


increment = proc () returns (j: int) 
uses Integer 
remembers ctr: int 
pre ctr$initt = false 
changes ctr, ctr$init 
post ctr} = 1Ajt = 1 A ctr$init) = true 
pre ctr$initt = true 
changes ctr 
post ctré = ctrt + 1Aj+ = ctré 
end ; 


The first time the increment procedure is called, the value of the integer object, ctr, is 


initialized to 1 and returned. Subsequent invocations will return successive integers. 
4.2.2 Iterators 


An iterator computes a sequence of items of objects, one item at a time, where an item is 
a set of zero or more objects. We amend our model of a computation sequence to include 
iterator invocations, which we treat similarly to procedure invocations. The only way an 
iterator can be invoked is by use of a for statement. The execution of the for statement 
includes one or more invocations of the iterator and is terminated when the iterator 


terminates. 


elements = iter (a: array[int]) yields (int) 


next: int : = array[int}]$low(a) % 1 
while true do % 2 
yield (a[next]) % 3 

next := next + 1 %4 

end %5 
except when bounds: return %6 

end %7 


end elements 


flip_sign = proc (a: array[int]) returns (array[int]) 
b:= array[int]$create(array[int]$low(a)) 
for i: int in elements(a) do 
addh(b, -i) 
end 
return (b) 
end flip_sign 


Figure 14. Elements Iterator, Implementation and Use 


An example of an elements iterator and its use are given in Figure 14. Elements 
computes a sequence of integers. The flip_sign procedure creates a new array with the same 
low bound as a, the input array, and returns an array with the signs of all the integers of a 
reversed. The first time elements is invoked, the integer at the low bound of a is yielded 
(statement 3). A subsequent invocation of elements yields the next integer of a. This process 
continues until a bounds exception is raised, in which case elements terminates (statement 


6). 


We need to distinguish between two kinds of termination for iterators. The first is when 
an iterator yields an item soueiuinD an invocation from a for statement, e.g. statement 3 of 
elements. An alternate view of this situation is that the iterator does not “terminate,” but is 
just in a “suspended" state. The additional piece of semantics we need for the specification 
of an iterator is a special termination condition. We reserve the identifier, suspend € 
TermCond, for the value of this termination condition, and we add a corresponding 


suspends assertion to the assertion language. The second kind of termination is when the 
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iterator returns, causing the for statement to terminate, e.g., statement 6 of elements. As with 
procedure specifications, we use the termination condition normal for this kind of 


termination. 
Syntax 


The syntax for an iterator specification is as follows: 


lterSpec ::= Iterld = IterHead <Link> IterBody end 
lterHead ::= iter Args <Yields> <Sigs> 

lterBody ::= <Rmbr> Quad + 

Yields ::= yields Args 


n 


As with a Rets clause in procedure specifications, an object identifier in a Yields clause is an 


output formal; the object it denotes is an output argument. 


Recall that we list in the header of a cluster specification the identifiers of procedure 
specifications that are specified in the body. We also include iterator specifications in a 


cluster specification. We modify the syntax as follows: 


ClusSpec ::= Typeld = cluster is Routid + , ClusLink ClusBody end 
ClusBody ::= RoutSpec + 

Routid ::= Procid | Iterid 

RoutSpec ::= ProcSpec | IterSpec 


A routine specification is either a procedure or iterator specification. Bound and free routine 


specifications are defined in a similar way to bound and free procedure specifications. 
We add to the syntax of the assertion language: 
Assn ::= ...| suspends 
with truth value: 


T[suspends](c, o’, A, ») = o’.s(terminates) = suspend 
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Checking 


The syntax-checking of the body of an iterator specification is as defined for procedure 
specifications. A suspends assertion can appear in only post-conditions. We also allow the 


use of all syntactic amenities introduced in Section 4.1 for iterator specifications. 
Translation 
An iterator specification of the form: 


It = iter (x1: S1, ..., xm: Sm) yields (y1: T1, ..., yn: Tn) signals (e1, ..., ep) 
uses Tr 
preP 
post Q 
end 


translates to: 


It = proc (x1: S1, ..., xm: Sm) signals (suspend (y1: T1, ..., yn: Tn), e1, ..., ep) 
uses Tr 
preP 
post Q 
end 


Example 


tokens = iter (s: stream) yields (t: token) 
uses StreamTrait 
pre ~isEmpty(st) 
mutates s 
postt) = head(st) Ast = rest(st) A suspends 


pre isEmpty(st) 


post returns 
end 


Each time the iterator is invoked with a nonempty input stream object, tokens mutates the 
stream and yields a token from it. The specification does not forbid the possibility that s be 


changed in the body of a for statement. Recall that a returns assertion in the second 
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post-condition is equivalent to the assertion terminates) = normal. 
Memory Used With Iterators 


The specification of memory objects in iterator specifications requires making additions 
to our model of CLU computations. Because we are modeling each individual invocation of 
an iterator, and not each for statement that invokes an iterator, we need to be careful about 
specifying the effect of an iterator on its memory. In particular, initialization of memory for an 
iterator is done at the first invocation of that iterator in the first for statement of the 
computation that invokes it. Subsequent for statements that invoke it do not "reinitialize" 


memory. 


We distinguish a use from an invocation of an iterator, Iter. Each for statement that 
invokes Iter is a use of it. Each iteration within a for statement that uses Iter is an invocation 
of it. For example, in Figure 14, flip_sign uses elements once but invokes it (possibly) many 


times. 
Meaning 


Let first denote a special memory object that enables us to distinguish the first 
invocation of an iterator from subsequent invocations in a for statement. We view first as a 
"global" or "ghost" variable accessible in all states in a computation. At the first invocation 
of each use of an iterator, first is true; otherwise, it is false. Therefore, at the first invocation 
of an iterator of each of its uses, first is true; at each intermediate invocation of each use, 


first is false. Immediately before each use first is true. 


To achieve the desired effect of first being true before each use of an iterator, we 
associate an implicit assignment statement "first := true" before the (syntactic) appearance 
of each for statement in the program text. This ensures that if a statement, S, in a 


computation is the first invocation of an iterator the value of first is true in the state preceding 
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§;. For a computation sequence, 
89 Sy 84, ++) On.4 Sy Op 
we have: 


1. first€oo.0 
2. For alli > 1, if S; is a first invocation of an iterator, 0;.4.S(o;.,.e(first)) = TRUE; 
otherwise, 0;.,.S(0;.1.e(first)) = FALSE; 


We extend the domain and range of the relations of all iterators to include first as we 


did for other memory objects. 
Syntax 


Since we often need to check whether or not we are at the first invocation of an iterator, 


we add to the assertion language: 
Assn ::= ... | firstinv 
with truth value 
T[firstInv](o, 0’, A, 2) = o.s(0.e(first)) = TRUE 


We do not provide an assertion to check whether we are at the first use of an iterator for 
the same reason we do not provide an assertion to check whether we are at the first 
invocation of a procedure. The only reason we might (incorrectly) think we would need the 
~ ability to make these distinctions is because of the initialization of memory. Recall, however, 
that initialization of memory objects is not necessarily done at the first use of an iterator or at 
the first invocation of a procedure. It is necessary only to distinguish between whether 
memory has been initialized, which we can do using the "init" boolean object associated with 


each memory object. 
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We do provide two implicit assertions with iterator specifications. First, note that after 
the first invocation of any use of an iterator, the final value of first should be false, and after 
subsequent invocations, its value can remain false. Hence, we implicitly append the assertion 


first) = false to each post-condition of a quadruple of an iterator specification. 


Second, since one of the possible effects of an iterator invocation is to change the 
binding of first, we implicitly append first to the list of object identifiers of each changes 
- clause in each quadruple of an iterator specification. If a changes clause does not explicitly 


appear, we implicitly include one in each quadruple. 


Translation 
A body of the form: 
preP 
mutates M 
post Q 


where Q has no changes assertion, translates to: 


pre P 

changes first. 
mutates M 

postQ A first) = false 


A body of the form: 


preP 
changes C 
mutates M 
post Q 


translates to: 


preP 

changes C, first 
mutates M 

post Q A firsts = false 


Example 


One use of memory with iterators is to specify that the initial value of an argument to the 
iterator is the same as the final value from the previous invocation. 
elements = iter (s: set) yields (e: elem) 


uses SetOfElem 
remembers myset: set 


pre ~isEmpty(st) A [firstinv V st = mysett] 
mutates myset, s 
post has(st,e+) As} = remove(st,e+) A myset} = sv) A suspends 


pre isEmpty(st) A [firstInv V st = mysett] 


post returns 
end 


In the above e/ements specification, myset is a set object used to remember the value of 
the set object from invocation to invocation. The st = mysett conjunct that appears in both 
prevcunditione requires that the initial value of the set object at each invocation be the same 
as the "remembered" value from the previous invocation. The first triple handles the cases 
when the set argument is not empty and either (1) it is the first invocation of elements, or (2) it 
is not the first invocation and the initial value of s is the same as the remembered value. The 
second triple handles the cases when s is either initially empty, i.e., at its first use, or becomes 


empty from the previous invocation of any of its uses. 
4.2.3 Parameterized Specifications 


Procedures, iterators, and clusters may all be parameterized in two ways: over certain 
types of objects and over type identifiers. We call a parameter of the first kind an object 


parameter; the second, a type parameter. An integer object parameter, n, for example, can be 
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used in a procedure that computes the average of a list of numbers, where n is the length of 
the list. Type paiaincinn are far more common in CLU than object parameters. A /ist cluster, 
for example, can be parameterized over a type parameter, T, to stand for a set of clusters, 
each defining a list[A] type for some actual type identifier, A. Type parameters can also have 
restrictions. ‘In Section 4.2.3.1 we discuss parameterized specifications without restrictions; 
in Section 4.2.3.2 we describe the kinds of restrictions that we can impose on type 


parameters. 
4.2.3.1 Parameterization Without Restrictions 
Syntax 

We modify the syntax as follows: 


ProcHead ::= proc <Parms> Args <Rets> <Sigs> 

lterHead ::= iter <Parms> Args <Yields> <Sigs> 

ClusSpec ::= Typeld = cluster <Parms> is Routid + , ClusLink ClusBody end 
ClusMap ::= provides MutFlag Typeid from Sortid 


Parms ::= [ParmDec! + ,] 
ParmDec! ::= Objld: TypeSpec | Idn: type 
Where ::= where Restriction +, 


Object parameters are of the form Objid: TypeSpec; type parameters are of the form /dn: 
type. Parameters of a procedure or iterator specification should not be confused with the 
input and output formals (object identifiers) of the specification, nor with objects bound to the 


formals. 
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Checking 


1. Object parameters are of only the following types: null, bool, int, 
real, char, and string. 


2. The body of a parameterized specification sort checks. For a 
term, 7, denoting an object of type T, where T is a type parameter, 
the sort of 7 is T_obj. The sort of the terms, rt and r4, is TtoS(T). As 
usual, the names of these sorts must appear in the used trait. 


- Meaning 


A model of a parameterized procedure specification is a set of operations 
(relation-algebra pairs). Each operation in the set is a model of an instantiated specification, 
obtained by textually substituting a list of actual parameters, A, for the list of (object and type) 
parameters, F, of the parameterized procedure specification. For the following parameterized 
procedure specification, 

Pr = proc [F] (InList) returns (OutList) signals (SigList) 
uses Tr : 
preP 


post Q 
end 


an instantiated specification is of the form: 
Pr[A] = proc (InList [A/F]) returns (OutList [A/F]) 
signals (SigList [A/F]) 
uses Tr’ 
pre P [A_obj/F_obj, TtoS(A)/TtoS(F)] 
post Q [A_obj/F_obj, TtoS(A)/TtoS(F)] 
end 
where Tr’ is the trait, 
Tr’: trait 
includes Tr with [A_obj for F_obj, TtoS(A) for TtoS(F)] 


We adopt the convention of naming each of these instantiations "Pr[A]." We do the 


renamings in the pre- and post-conditions because sort identifiers can appear in quantified 


expressions in the assertions. The first list of renamings handles obj sort identifiers; the 


second, value sort identifiers. 


A model of a parameterized cluster specification is a set of abstract data types (recall 
that an abstract data type is a pair consisting of a set of objects and a set of operations). Each 
abstract data type is a model of an instantiated cluster specification. For the parameterized 


cluster specification (MutFlag is either the keyword mutable or immutable), 


C = cluster [F] is RoutldList 
uses Tr 
provides MutFlag C from S 
RoutSpecs 


end 


each instantiation is of the form: 


C[A] = cluster is RoutidList 
uses Tr’ 
provides MutFlag C[A] from S 
RoutSpecs [A/F, A_obj/F_obj, TtoS(A)/TtoS(F)] 
end 


where again Tr’ is the trait, 


Tr’: trait 
includes Tr with [A_obj for F_obj, TtoS(A) for TtoS(F)] 


_The first list of renamings for RoutSpecs (A/F) is used to rename type identifiers in the 
headers; the second and third lists are used to rename the sort identifiers in the pre- and 
post-conditions of each of the routine specifications. We adopt the convention of naming 
each of these cluster specifications "C[A]." Notice that each type, C/A], maps to the same 


sort identifier, S. 
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Example 


The following is a parameterized set cluster specification: 


set = cluster [T: type] is ..., insert, ... 
uses SetOfT 
provides mutable set from ST 


insert = proc (s: set[T], t: T) 


pre true 
mutates s 
post st = add(st,t) 
end 
end 


where the SefOfT trait is given below using the SetOfE trait of previous chapters. 


SetOfT: trait 
includes SetOfE with [ST for C, T_obj for E] 


An instantiation of the above parameterized cluster specification is as follows, where the 


actual type identifier is int, and SetOfT’ is the SefOfT trait with int_obj substituted for T_obj. 


set[int] = cluster is ..., insert, ... 
uses SetOfT’ 
provides mutable set[int] from ST 


insert = proc (s: set[int], t: int) 
pre true 
mutates s 
posts! = add(st,t) 
end : 


end 
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4.2.3.2 Parameterization With Restrictions 


We often find it useful to place restrictions on type parameters. These restrictions play a 
similar role to that of the assumptions of a trait in Larch. We write these restrictions in a 


Where clause. We modify the syntax: 
Syntax 


ProcHead ::= proc <Parms> Args <Rets> <Sigs> <Where> 
lterHead ::= iter <Parms> Args <Yields> <Sigs> <Where> 
ClusSpec ::= Typeld = cluster <Parms> is Routid +, 
<Where> ClusLink ClusBody end 


Where ::= where Restriction +, 

Restriction :: = BasicRestriction | Typeld in TypeSet 

BasicRestriction ::= Typeld immutable | Type/d has RoutHead 
| Typeid has RoutSpec 

TypeSet ::= {Typeld | BasicRestriction + ,} 

RoutHead :: = ProcHead | IterHead 


The where clause is removed upon an instantiation of a parameterized specification. The "|" 
symbol? in the TypeSet production should not be confused with the "{" symbol used as an 


alternative separator in the grammar. 
Checking © 


We check that the actuals substituted for type parameters satisfy the restrictions in the 
where clause. There are four kinds of restrictions on a type parameter. Three are "basic" 
restrictions, two of which require only syntax checks; the third requires a semantic check. 
The fourth kind of restriction is built up from these basic restrictions and hence, may also 
require semantic checks. In the following discussion on these four restrictions, let T be a type 


parameter, A be a type, and Cl, be the cluster specification defining A. 


9. It is a reserved symbol in CLU. 
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The first kind of restriction is of the form, T immutable. To check that A satisfies this 
restriction, we check that the type flag of Cl, is immutable. It is not a kind of restriction that 
can be placed on type parameters in CLU, but we include it in the specification language 
because proofs (e.g., those that use the type induction principle of A) may depend on a type 


being immutable. 


The second kind of restriction is of the form, T has R = Sig, where R is in Routid and 
Sig is in RoutHead. To check that A satisfies this restriction, we check that Cl, contains a 


routine named R with the signature Sig. 


The third kind of restriction, stricter than the second, is of the form, T has R, where R is 
in RoutSpec (R includes a signature and a body). To check that A satisfies this restriction, we 
check that the theory of R is a subset of the theory of A. This restriction is not present in CLU 
because it involves semantic checking. The second kind of restriction is a special case of the 


third where the pre- and post-conditions are both identically true. 


The fourth kind of restriction is included for completeness since it is allowed, but rarely 
used, in CLU. It is of the form, T in {X | X has r1, ..., m}, where r1, ..., mn are restrictions of the 
three forms just described. To check that A satisfies this restriction, we check that A satisfies 


all the restrictions, r1, ..., rn. 
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Examples 


set = cluster [T] is... 


where T has 
equal = proc (t1, t2: T) returns (b: bool) 
pre true 
postb¢ = (t1 = t2) 
end 
uses SetOfT 


provides mutable set from ST 
end 


The implementations that satisfy this specification would differ from those that would a 


specification in which the post-condition of equa/ was replaced by 
post bd = (t1t = t2t) 


The difference is that the first specifies that the elements to the equal procedure be the same 
objects whereas the second specifies only that the elements have the same value. There. 
would be fewer implementations satisfying each of these two specifications than those 


satisfying a specification in which we do not specify the behavior of equa/ at all. 
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5. Evaluating Specifications © 


in the incremental development of a large specification, providing useful feedback to a 
specifier can increase his confidence that his specification is on the right track. For example, 
a specifier may wish to know if his specification is in some sense "correct," i.e., that it 
captures his intuition of what he is trying to specify, or that it is in some sense "good," i.e., 


that it satisfies a set of desired objective and possibly subjective properties. 


We distinguish a specification trom what it specifies, i.e., from the specificand set of a 
specification [Guttag82]. Providing feedback to a specifier may help him better understand 
both the specification and its specificand set, and consequantiy may cause him to modify or 
improve the specification. Depending on how informative the feedback is, it may even point to 


a place in the specification where an improvement can be made. 


One way of providing such feedback is to provide the specifier ways of evaluating a 
specification. In this chapter, we consider two forms of evaiuation: checking specifications 
for various properties, and comparing specifications with respect to various qualities. For 
example, we might like to check if a specification is consistent or compare the strength of two 


specifications. 


Checking is performed on a single specification; in Section 5.1 we discuss checking for 
the following four properties: consistency, full-coverage, determinism, and protection. 
Comparing is performed on two specifications; in Section 5.2 we discuss comparing two 
specifications with respect to the quality strength. In Section 5.3, we discuss checking a 
specification for a property, essentiality, with respect to a theory. All definitions are in terms 


of theories. 


We do not give an extensive enumeration of properties and qualities, but just asample to 
suggest the usefulness of evaluating specifications and to illustrate our approach. We leave 


for future work the tasks of identifying and defining additional properties and qualities, 
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analyzing the tradeoffs among them, and finding other methods of evaluating specifications. 
5.1 Properties of Specifications 


Following our specification approach, we put together pieces of existing specifications 
to create a larger specification targeted for a particular problem or problem domain. As the 
specification grows incrementally, we might invoke a "checker" to test for a property of the 
specification. In the process of tuning a specification, we would probably invoke such a 
checker many times. If a specification does not have a property, we can choose either to 
modify the specification so that it does, or accept the fact that it does not--a checker is used 
only to provide information, not to inhibit the progress of writing the specification. Checking 
for a property might also necessitate a clarification in the client's problem statement. For 
example, discovering that a specification is inconsistent may point to a contradiction in the 
problem statement--the specification merely reflected the mistake. The signatures of the 


properties we will discuss are shown in Figure 15. 


Two properties of a specification that might be of interest are consistency and 
completeness. The ability to check for consistency is probably of more use than the ability to 


check for completeness. Knowing a specification is inconsistent informs the specifier that no 


consistent: trait ~> boolean 
consistent: procedure specification — boolean 
consistent: cluster specification — boolean 


fully-covering: procedure specification — boolean 
fully-covering: cluster specification —> boolean 


deterministic: procedure specification — boolean 
deterministic: cluster specification — boolean 


protective: procedure specification —> boolean 
protective: cluster specification — boolean 


Figure 15. Signatures of Properties 
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implementation could be written to satisfy the specification. We define consistency in Section 


5.1.1. 


We do not define completeness because we expect most specifications to be incomplete 
in the logical sense’? as well as in the practical sense--in the development of a large 
specification, we may have no intention of ever "finishing" it. We usually want to know when 
we have said "enough" as opposed to "everything." In Sections 5.1.2-5.1.4 we define three 
properties: full-coverage, determinism, and protection. Each gets at a different noten of 


sufficiency as a different kind of approximation to completeness. 


For each property, we first motivate it, then define it, and then discuss specifications 


with that property. When we define each property we also motivate our definition. 
5.1.1 Consistency 
5.1.1.1 Definition 


The usual notion of consistency of a formal system refers to the inability to derive an 
explicit contradiction. For a given first-order predicate logic formal system, a set of formulae, 
p, is inconsistent if and only if for some A, both A and ~A are theorems in pm. Equivalently, 
is inconsistent if and only if false is in p. We will use the second definition to build the notion 


of an inconsistent specification. 


Def: A trait, Tr, is inconsistent if and only if the formula (true = false) or the formula false is in 
‘Th(Tr). 


Def: A procedure specification, Pr, is inconsistent if and only if (1) there exists a satisfiable 
formula P such that the formula P{Pr}false is in Th(Pr), or (2) Pr’s used trait is inconsistent. 


Def: A cluster specification, Cl, is inconsistent if and only if (1) true{S}false is in Th(C!), or (2) 
for any of Cl's procedure specifications, Pr, there exists a satisfiable formula P such that the 
formula P{Pr}false is in Th(Cl), where P is satisfiable, or (3) Cl’s used trait is inconsistent. 


10. Given a formal system, its theory is complete if for all formulae, F, we can determine whether F or ~F is in the 
theory. 
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Def: A specification is consistent if and only if it is not inconsistent. , 


Checking for consistency is in general undecidable since first-order logic is 
undecidable. Under certain conditions, however, we may be able to show that a specification 
is consistent or inconsistent. For example, for equational theories, on which trait theories are 
based, a semi-decision procedure exists that checks for inconsistency by generating the 
contradiction true = false (and checks for consistency by generating true) for some sets of 


equations when treated as sets of rewrite rules [Knuth69, Musser77]. 


From the way we construct procedure and cluster specifications, it would be useful to 
know under what conditions putting smaller consistent pieces together results in a 
specification that is guaranteed to be consistent, or, on the other hand, to know when 


inconsistencies may be introduced. 


A procedure or cluster specification cannot add formulae that would be inconsistent 
with a consistent used trait. The theory of a procedure specification is a conservative 
extension of the theory of its used trait; it adds formulae only of the form P{Pr}Q, and none of 
the form t1 = t2 or ¥x:S P(x). Therefore, the procedure specification cannot add the formula 


true = false or false, either of which would be inconsistent with a consistent trait. 


To check a procedure specification for consistency, if the used trait is consistent, we 
need to check only that no formula P{Pr}false, where P is a satisfiable predicate, is in Th(Pr). 
Notice also we define inconsistency of a procedure specification in terms of Th(Pr) and not 
Th(Pr+) so as not to include the theory of the defined type when Pr is a bound procedure 
specification. Since the theory of a cluster specification is defined in terms of the theories of 


its procedure specifications, we-avoid a circular definition. 


To check a cluster specification for consistency, if the used trait is consistent, we need 
to check that each bound procedure specification is consistent and that their union is 


consistent (both cases covered by clause 2 of the definition of an inconsistent cluster 
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specification), and that the addition of the type induction principle for the defined type does 
not introduce any inconsistencies (covered by clause 1). This matches our intuition since 
even if the theories of the procedure specifications are individually consistent, their union may 


not be; moreover, an additional rule of inference may be used to introduce an inconsistency. 
5.1.1.2 Consistent Specifications 


Consistency is a desirable property of all specifications. Inconsistent specifications are 


more common than one might imagine, as the following example illustrates. 


intersect = proc (s1, s2: set) returns (s3: set) 
uses SetOfint 
pre true 
post Vi:int [has(s3¥,i) = has(sit,i) A has(s2t,i)] 
end 


Suppose intersect is a free procedure specification. We show that Th(intersect) is 
inconsistent, given the set cluster specification is SetClusSpec. It is inconsistent because 
there is no set object that can be returned as the intersection of disjoint input arguments. 
Notice that step 5 uses the theorem, true {intersect} Vs:set card(s) > 0, from Th(se?) 


derivable from the type induction principle for sets. 


1. Letsit = add(empty,1) A s2t = add(empty,2). 
2. true {intersect} Vi [has(s34,i) = has(s1t,i) A has(s2t,i)] 
--axiom of Th(intersect) 
3. true {intersect} Vi [has(s3¥4,i) = has(add(empty,1),i) A has(add(empty,2),i)] 
--simplified invocation rule with the substitution as indicated 
4. true {intersect} card(s34) = 0 
--¥x:SI [Vi:lnt has(x,i) = false = card(x) = 0] € Th(SetOfint) 
5. true {intersect} Vs:set card(s+) > 0 
--Induction rule from Th(set) 
6. true {intersect} V:s card(s}) >0 A card(s34) = 0 
--conjunction of two post-conditions (Hoare proof rule) 
7. true {intersect} false . 
--Lets = s3. 
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Notice that if intersect were bound, it would be consistent because the theorem of step 5 
would no longer hold. Th(set) would be different (e.g., we could construct an empty set 
object) because it would include Th(intersect) and so set’s type induction principle would 


have a weaker form. 
5.1.2 Full-Coverage 


In this section and the next two, we will define three properties that are related to the 
"completeness" property of a specification. These three represent examples of the kinds of 


approximations to completeness a specifier might want to check of his specification. 


A common error in programming is forgetting to cover all the cases. As a result, a 


. program may behave in an erroneous or surprising manner on some inputs. We would like to 


be able to prevent the occurrence of these errors before coding begins, i.e., in the design 
phase, by making sure our specification covers all the cases that can arise. For example, the 
following specification, | 
search = proc (a: array, e: elem) returns (index: int) 
uses ArrayOfElem , 

pre isSorted(at) 

postet = fetch(at, index) 

end 
is not fully-covering because the case for the unsorted array is not covered. A checker for 


full-coverage invoked on search might prompt us to add another pre/post pair to handle the 


unsorted array. 


Unlike consistency, however, full-coverage is not always desired. We may intentionally 
want to leave some cases unspecified because we know they will never arise or because we 
want to let the programmer decide how to handle them. In the example above, we may 
decide not to add another pre/post pair if we expect search to be invoked always with a 


sorted array. 
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5.1.2.1 Definition 


We want the definition of full-coverage to capture the notion that the behavior of a 
procedure is specified for all "reachable" input states. In terms of models, a procedure is 
fully-covering if the domain of the input-output relation of any operation modeling a 
procedure is the entire set of states, Z(Va/). One way of capturing the notion of full-coverage 
of a procedure specification in terms of theories is that if the pre-condition of the procedure 
specification is equivalent to true, then the relation is defined for all input states, and so the 
procedure specification is fully-covering. That is, 


Def: A procedure specification, Pr, is fully-covering if and only if true {Pr} Pr.post is in 
Th(Pr +). 


Def: A cluster specification is fully-covering if and only if all its procedure specifications are 
fully-covering. 


5.1.2.2 Fully-Covering Specifications 


A specification may not appear to be fully-covering when it is. Consider SetClusSpec, in 
which each of its procedure specifications, in particular, delete, is fully-covering. Although 
the disjunction" of delete’s pre-conditions is not identically true, it is provably true from the 
Th(set), which is contained in Th(delete +). The proof that delete is fully-covering would use 
the theorem, true {S} Vx:set card(x+) > 0, which comes from the ‘vine induction principle for 


SetClusSpec. 


In practice, writing a procedure specification that is fully-covering is similar to 
generating sufficient test cases for a program [Goodenough75, McMullin82]. A helpful 
guideline to follow is for the specifier to use in a_ stylized manner, multiple 


pre/changes/mutates/post quadruples in conjunction with signals assertions (for multiple 


11. Recall from Chapter 4 that the appearance of multiple pre-conditions translates to the disjunction of all the 
pre-conditions. 
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termination conditions) to cover all the cases. If one pre-condition places a restriction on the 
input state, then other pre-conditions should cover the cases for which the restriction does 
not hold. For each separate case, there is typically a different termination condition. As a 


result, the behavior of the procedure is "fully" specified. 
5.1.3 Determinism 


In specifying a program, it is not always easy to separate decisions that should be made 
at design time from those that should be delayed to implementation time. A specification 
should impose as few constraints as possible to avoid unnecessarily overspecifying the 
behavior of the program. An intentional lack of constraint can be regarded as an intentional 


incompleteness. 


Nondeterminism gets at the notion of introducing an intentional incompleteness in a 
specification. It says that the values of input and output objects of a procedure specification 
are not predictable in the final state. A nondeterministic specification allows the implementor 
the freedom to choose the most convenient (e.g., efficient to implement) values. For example, 
in implementing a choose procedure for sets, returning the last integer inserted may be more 


efficient than returning the largest integer. 


In contrast, determinism requires that the final values of the input and output objects be 
predictable. Whereas the fully-covering property deals with the “completeness” of a 
. Specification with respect to input states, determinism deals with it with respect to output 


states. 
5.1.3.1 Definition 


A specification is deterministic if for each state that satisfies the pre-condition, only one 
set of final values for the input and output objects satisfies the post-condition. We define this 


property in terms of theories, analogously to the usual definition for a function. A relation, f, 


-121- 


on X X Y is a partial function if for all xEX, y1,y2EY [(<x, yI>€Ef A <x, y2>€f) = y1 = y2]. For 
determinism, we require the relation between the values of input and output objects defined 


by a procedure specification to be a partial function. 


Let X be the list of input formals and Y be the list of output formals for the procedure 
specification Pr. To simplify the following discussion and definitions, we will treat memory 
objects as (implicit) input objects and require that all memory object identifiers be included in 
X. All formals in the signals clauses are included in Y (by definition). Let Pr.pre(Xt) be the 
pre-condition on the initial values of input objects, and Pr.post(Xt, X+, Y+) be the 


post-condition on the initial and final values of input and output objects. 


Def: A procedure specification, Pr, is deterministic if and only if Th(Pr+) contains the 
following formula: 


V A, Al, A2: T-in, B1, B2: T-out 
Pr.pre(At) = 
[Pr.post(At, A1¥, B14) A Pr.post(At, A24¢, B24)] = 
Alt = A24 ABts = B2b. 


where T-in is the list of types of the input objects and T-out is the list of types of the output 
objects. 


Def: A cluster specification is deterministic if and only if all of its procedure specifications are 
deterministic. 


Def: A specification is nondeterministic if it is not deterministic. 


Recall that a state cancers of not only a store (mapping from objects to values), but also 
a set of (existing) objects, and an environment (mapping from object identifiers to objects). 
The definition of deterministic places no constraints on the set of.objects or the environment 
of the final states. A more restrictive definition could require that for each input state in which 
the pre-condition is satisfied, there exists a unique output state in which the post-condition is 
satisfied--restricting the set of output states satisfying a post-condition to be a singleton set. 
We see no reason, however, to rule out a procedure that may, for example, create in the 


process of execution new objects that may be inaccessible upon termination of the 
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procedure. Similarly, we should not rule out a procedure that may change the bindings of its 
formals since those changes are not observable outside the procedure. in these cases, the 
sets of objects or the environments of the possible output states satisfying the post-condition 


may differ. 
5.1.3.2 Deterministic Specifications 


A specifier may intend a specification to be deterministic or not. A procedure 
specification may turn out to be nondeterministic because of an unintentional oversight on 


the part of the specifier. The following procedure specification, 


choose1 = proc (s: stack) returns (i: int) 
uses StackOfint 
pre ~isNull(st) 
mutates s 
post it = top(st) 
end 


is nondeterministic--the final value of s is indeterminate because of the presence of the 
mutates clause. To make choose? deterministic, the specifier could add the conjunct st = 
pop(st) to the post-condition, or remove the mutates clause. On the other hand, the 
specifier may have intended to let the implementer decide whether or not to pop the stack, 


and therefore may have intended choose1 to be nondeterministic. — 


Checking for determinism requires showing that a formula is in a theory; checking for 
nondeterminism, that it is not. A specifier could show the latter by assuming the formula is in 
the theory and finding a contradiction to show otherwise. For example, the following 


procedure specification, 
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choose2 = proc (s: stack) returns (i: int) 
uses StackOfint 
pre ~isNull(st) 
post isin(st, i+) 
‘end 


is nondeterministic. Suppose 


Vs:stack, i1,i2:int 
~isNull(st) = 
{isin(st,i14) A isin(st,i24) A mutates 2] = 
{il} = i24] 
is in Th(choose? + ). Then let st be push(push(null, 5), 7), i14 be 5, and i24 be 7 to derive a 


contradiction. 
5.1.4 Protection 


By partitioning a specification into two tiers, we can avoid at the top tier an 
incompleteness at the bottom tier. In particular, a procedure specification should be able to 
use a trait even if the trait is not sufficiently-complete [Guttag75]. It is the procedure 
specification’s responsibility to protect any of its users from the incompletenesses of the trait 
by ensuring that the meaning of the procedure specification is independent of those 


incompletenesses. 


Axioms of the form "r exempt" are included in a trait to inform the specifier of an. 
intentional incompleteness. We would like to ensure such incompletenesses do not show 
through to the interface level. For example, since the axiom top(null) exempt is in the 
StackOfint trait, the following procedure specification is not protective. 

read1 = proc (st: stack) returns (i: int) 
uses StackOfint 
pre true 


post it = top(stt) 
end 


If the initial value of st were null, then the incompleteness of the stack trait would show 
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through to the interface level because the value of the integer returned would be denoted by 


the exempt term top(null). 


Factoring a specification into two tiers allows us to factor our checks as well. If upon 
checking a trait for sufficient-completeness, we discover it is not sufficiently-complete, we 
may be inclined to invoke our checker for protection. For example, invoking a checker for 
protection on read? might cause us to modify it to be: 

read2 = proc (st: stack) returns (i: int) 
uses StackOfint 
pre ~isNuil(stt) 


posti¥ = top(stt) 
end 


Read2's pre-condition is sufficiently strong so that the value of the returned integer object 
would never be denoted by the term top(nuil); hence, the incompleteness at the trait level 


would not show through to the interface level. 
5.1.4.1 Definition 


We say that a procedure specification is protective if it is independent of the set of 
exempt terms of its used trait. We build up to the definition of protection by first 
characterizing the set, E(Tr), of exempt terms of a trait, Tr, and then defining snaaashdaat of 
a set of terms." | 


Def: For a trait, Tr, the set, E(Tr), of exempt terms of Tris 


E(Tr) = {t| 4t'du such that (t’ = u)€Th(Tr), where t’ is a subterm of t, 
and u is an instantiation of a term appearing exempt in Tr} 


E(Tr) includes all terms that have a subterm that is provably equal to an instantiation of | 
an exempt term. For example, for the StackOfE trait (Appendix |, Figure 13), E(StackOfE) = 
{top(null), pop(null), size(top(null)), top(pop(push(null,e))), ...}. E(Tr) does not include terms 
about which the trait does not say anything. For example, if the last equation in StackOfE 


were removed, it then would not constrain the term size(push(s,e)). The reason we do not 
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include these kinds of terms in E(Tr) is that given a set of axioms in a trait, we cannot, in 
general, generate all the terms that are "intentionally" and "implicitly" not constrained. It is 


easy, however, to know what terms are explicitly exempt. 


We now give the definition of "independent of a set of terms." Intuitively, it captures the 


notion of never having to deal with certain terms. We follow it with the definition of protection. 


Def: Let S be a set of terms. An assertion, A, appearing in Pr is independent of S, if 
1. No subterm of A is in S, or : 
2. 3B ([A = B] € Th(Pr)), and B is independent of S. 


Def: Pr is protective if 
1. Pr.pre is independent of E(Tr), and 
2. Pr.pre = Pr.post is independent of E(Tr). 


Def: A cluster specification is protective if each of its procedure specifications is protective. 
5.1.4.2 Protective Specifications 


Protection is a desirable sesnety of an interface specification. The specification should 
not be dependent on properties of the values denoted by exempt terms, and in reasoning 
about it the specifier does not want to be "stuck" with terms that are exempt. [f upon 
checking to see if a specification is protective, we find that it is not, we may be able to find the 


dependency in the specification and then fix the specification to remove tt.. 


Checking may require some cleverness on the specifier’s part. It may involve finding an 
assertion equivalent to the one being shown independent of a set of exempt terms. 
Checking that the pre-condition is protective is usually easy because pre-conditions are 
usually simple. Checking the post-condition, however, is likely to be more difficult. Consider 


again the following example: 
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read2 = proc (st: stack) returns (i: int) 
uses StackOfint 
pre ~isNuil(stt) 
post it = top(stt) 
-end 


To show that read2 is protective, we show that it is independent of the set of terms 


E(StackOfint). 


1. Show ~isNull(stt) is independent of E(StackOfint). Trivial. 


2. Show ~isNull(stt) = it = top(stt) is independent of E(StackOfint). 
Referring to part (2) of the definition of when an assertion is independent of 
a set of terms, let B be [isNull(stt) V Js1:Sl, i1:int [stt = push(s1,i1) A i}: = 
i1}]. 


In practice, writing a protective procedure specification is straightforward provided that 
the trait is actually strong enough to specify the desired properties. Strong enough 
pre-conditions are written to make sure that even if a post-condition alone is not independent 
of an exempt term, the assertion "Pre => Post" is. Often enriching the set of functions of the 
used trait makes it easier to read and write pre-conditions to handle these cases. For 
example, the function isNul/ is included. in the StackOfint trait instead of writing in the 


pre-condition the equivalent assertion, ~(stt = null). 
5.2 Comparing Specifications 


In the context of developing a large specification, one kind of evaluation we intend to 
perform is to compare specifications. For example, we might want to compare specifications 
with respect to their restrictivity, concision, elegance, or lucidity. (Judging a specification for 
some of these qualities is purely subjective, e.g., elegance and lucidity, and so we would not 
attempt to define these qualities formally.) We might invoke a "cémparator" to compare 
specifications with respect to these qualities. As with checkers, we would invoke a 
comparator many times during the development of the specification. Comparators can be 
used to help us decide between two specifications. For example, we often want to choose the 


less restrictive (constraining) of two specifications. Comparators can also be used to check 
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whether a change we make to a specification had some expected or unexpected effect on 
one of its qualities. For example, if we add something to a specification, we might like to know 


whether we have made it more restrictive or left its restrictivity unchanged. 


We discuss comparing specifications with respect to one quality, strength, of which 
restrictivity is a special case. Figure 16 gives the signatures of the corresponding 
comparators. In Section 5.2.1 we motivate comparing the relative strength between 
specifications. In Section 5.2.2 we define strength. In Section 5.2.3 we discuss the effect 


certain modifications to a specification has on its strength. 
5.2.1 Comparing Strength 


Intuitively, the stronger or more restrictive a specification, the fewer the number of 
implementations that satisfy it. In writing a specification, we may want to know whether one 
specification is as strong as or stronger than another. We may discover that after modifying a 


specification the new one is incomparable to the original. 


There are at least two situations in which it is useful to know when a specification is as 
strong as another. One is where we modify a specification but want to ensure its strength is 
unchanged... For example, if we rename identifiers of a specification in order to have 
mnemonic names, we would want to make sure we have made only a syntactic and not a 
semantic change. A second situation is in determining if it is permissible to replace a 
specification with another without affecting any of its users. If one specification is as strong 


as another, then under certain circumstances we should be able to substitute one for the 


as strong as: specification, specification — boolean 
stronger: specification, specification —> boolean 
restrictive: specification, specification —> boolean 


Figure 16. Signatures of Comparators 
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other. Comparing the strengths of the two specifications can help determine legality of 


replacement. This situation is addressed in [Bloom83] in the context of distributed programs. 


Sometimes, we may want a stronger specification. We might realize the specification is 
not strong enough in trying to prove a property of the specification or its specificand set. We 
could choose to either weaken the statement of what we were trying to prove or strengthen 
the specification. If we were to decide to strengthen the specification, we might want to 
compare the new and original specifications to make aire we did not make them 
incomparable. For example, if we were unsuccessfully trying to prove a cardinality property 
about sets based on a specification for bags, we might realize that either our axioms are not 
sufficient to prove it or that they are wrong. We might choose to strengthen the specification 
for bags to obtain one for sets that allows us to prove the desired cardinality property. When 
we discuss the essentiality of a specification in Section 5.3, we rely on the notion of strength 


in determining whether a specification is strong enough to prove some property. 
5.2.2 Definition of Strength 


The intuition we want to capture formally is that the stronger the specification, the fewer 
the number of implementations that satisfy it. We borrow the analogous concept from logic 
that the stronger a theory, the fewer the number of models that satisfy it, Saal define a strength 
relation between specifications in terms of strength between their theories. For example, the 
theory of <Z, +, -> is as strong as <N, 0, succ>, but not vice versa, where Z is the set of all 


integers, and N is the set of all natural numbers. 


We could define a theory, Th1, to be as strong as or stronger than another theory, Th2, if 
the two theories are in the same language and Th2 C Th1. Theory containment, however, is 
not sufficient to capture the notion of relative strength between two theories for two reasons. 
The first is that the two theories may be in different languages; thus, they may be disjoint, but 


still be as strong as each other. The second is that even if the two theories are in the same 
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language, a formula that is in Tht, but not in Th2, may be translatable to one in Th2; thus Tht, 


although larger, may not be stronger than Th2. 


In general, even if the theories are in different languages, there may exist a way of 
translating from one language to the other such that theorems of Th1 are translations of 
theorems of Th2. One reasonable way of translating from one language, L1, to another, L2 is 
to map symbols of L1 to those of L2: Mapping symbols is not sufficient because in some 
cases we could then show that one theory is stronger than another when they really are as 
strong as each other. For example, adding a new function symbol to.L1-to obtain L2 may not 
strengthen Th1 because the new function symbol can be defined in terms of symbols of L1. 


We will give an example of this situation in the next subsection. 


Therefore, more generally, determining when one theory is as strong as another 
depends on finding an interpretation that translates formulae of one theory into those of 
another. Most of the following definitions are adapted from [Enderton72]. Notice that an 
interpretation is a generalization of the notion of theory morphisms from algebraic theories 


[Burstall80, Burstall81] to theories in full first-order logic with equality. 


Let Th1 be a theory in a language L1 and Th2 be a theory in a (possibly different) 


language L2.'? Let » be a mapping from L1 into L2. 

Def: If VoEL1 [o € Thi = (a) € Th2], then a is an interpretation of Th1 into Th2. 

Def: Th1 is as strong as Th2 if there exists an interpretation of Th2 into Th1. 

Def: Th1 is stronger than Th2 if Thi is as strong as Th2 and Th2 is not as strong as Th. 


Def: Th1 and Th2 are incomparable if Th1 is not as strong as Th2 and Th2 is not as strong as 
Tht, 


Def: If Th1 and Th2 are in the same language, Th1 is more restrictive than Th2 if Th1 is 
stronger than Th2. 


12. L2 must include equality for technical reasons. 
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We extend the last four definitions to two specifications in the obvious way. For 
example, given two specifications, Spec! and Spec2, Speci is as strong as Spec2 if 


Th(Spec1) is as strong as Th(Spec2). 


Showing that Th1 is as strong as Th2 requires showing the existence of an interpretation 
from L2 into L1. Showing that Th1 is stronger than Th2 is much harder; it requires showing 
not only the existence of an interpretation from L2 into L1, but also that there does not exist 
any interpretation from L1 into L2. Notice that showing that Th1 is not stronger than Th2 is 
easier than showing Th1 is stronger than Th2 since for the former it suffices. to show the 


existence of an interpretation from L1 into L2. 


Finding an interpretation or showing the nonexistence of one is difficult in general. If we 

_ were to base our definition of strength on the simpler, but more restricted, definition of an - 
interpretation that is defined to map symbols of one language into those of another, then it 
would be easier to find an interpretation or show the nonexistence of one when comparing 
relative strengths of specifications. As previously mentioned, the alternate definition may be 


simpler, but it does not capture the strength relation we want. 


Finally, showing that two theories are incomparable requires showing the nonexistence 
of interpretations between the two languages in both directions. In some cases, however, to 
convince ourselves of incomparability, it suffices to show that there is a formula in L1NL2 that 
is in Th1 and not in Th2, and a formula in L1ML2 that is in Th2 and not in Th1. For interface 
specifications, the language of a shared trait can often be used as a basis for LINL2. We give 


an example of this situation in the next section. 
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§.2.3 Modifying a Specification With Respect To Strength 


It would be useful to characterize changes we can make to a specification by their effect 
on the strength of the original specification. Adding equations, reduces clauses, or closes 
clauses can strengthen a trait. Selecting a stronger used trait, or changing its pre- or 


post-condition can strengthen a procedure specification. 


To strengthen a cluster soacification: we could select a stronger used trait or add a 
procedure specification. Adding a procedure specification does not necessarily strengthen a 
cluster specification. Doing so might leave the strength of the cluster specification 
unchanged or weaken it. It might even make the original and new cluster specifications 
incomparable because type induction rules of the original cluster specification might become 


invalid. We later give examples of each of these cases. 


The kind of procedure specification that is added to a cluster specification can restrict 
the posaible effects on its strength. If T is the type defined by the cluster specification, 4 
procedure specification can be classified according to whether it specifies a procedure to 
construct or to observe objects of type T. A constructor returns or mutates objects of type T 
while an observer returns or mutates objects of type other than T. Using the terminology from 
Chapter 3, we can further classify constructors into basic, producing, and mutating 
constructors. In general, a procedure specification might both construct and observe objects 
of type T, as well as do combinations of all three kinds of construction. For the present 
discussion, we only consider the "pure" cases in which a procedure specification specifies 
either the construction or observation of objects of type T, but not both. For example, a "pure 
observer" specifies that a procedure takes in objects of type T, does not mutate any objects, 
and only returns objects other than type T. Figure 17 shows the possible effect adding a pure 


constructor or observer has on the strength of a cluster specification. 
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stronger as strong as incomparable weaker 
constructor ? yes yes yes 
observer yes yes no no 


Figure 17. Effect of Adding a Constructor or Observer on Strength 


Adding any kind of “pure constructor" has the possible effect of leaving the original 
specification unchanged, making it incomparable to the new, or weakening it. We conjecture 
that adding a éonetriictor cannot strengthen a cluster specification because adding a 
constructor adds a hypothesis to each of the type induction rules. Adding a hypothesis to a 
rule might leave unchanged, weaken, or invalidate an existing rule; it cannot allow us to 


conclude a stronger invariant. We leave the proof of our conjecture as an open problem. 


We now give some examples. Let Speci be SetCiusSpec and Spec2 be the result of 
adding a constructor to Spect. As an example of adding a constructor that leaves a 
specification’s strength unchanged, consider adding a pair procedure specification that takes 
in two (possibly equal) integers, i andj, and returns a set that is the union of {i} and {j}. Since 
formulae involving pair can be expressed in terms of singleton and union, no theorems of 
Th(Spec1) are invalidated and no new theorems are added. If, however, we had chosen our 
alternate definition that Setines an interpretation to map between symbols, then Sdaiag the 
identifier, "pair," would strengthen SetCiusSpec because pair could not be mapped to any 
identifier, id, in SetClusSpec such that formulae involving pair in Spec2 could be translated 
into formulae in Speci with id substituted in for pair. This example motivated our choosing 
the definition of strength as given since we intuitively believe that adding a constructor that 


does not change the invariant of a type should not strengthen the cluster specification. 


Adding to Spec1 a create procedure specification that takes in no arguments and 
returns an empty set makes Speci and Spec2 incomparable. One might think that by the 


addition of create Th(Spec2) would be strictly larger than Th(Spec1) and so Th(Spec2) would 
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be stronger than Th(Spec1). This is not true, however, since the formula, true{S}Vs:set 
card(s+) > 0, dinick is in Th(Spec1), is not in Th(Spec2) and the formula, true{S}4s:set 
card(s+) = 0, which is in Th(Spec2), is not in Th(Spec1). This example illustrates a perhaps 
surprising consequence of our definition. Intuitively, we would think that adding a constructor 
that increases the value set of a type should strictly strengthen the cluster specification. 
Strength, however, is defined in terms of theories, i.e., what is derivable from specifications, 


and not in terms of the "expressive" power of specifications. "9 


As an example of adding a constructor that weakens the strength of a specification, 
consider a stack[elem] cluster specification, Spec1, that has a pop procedure specification 
that returns a new stack whose value is that of the input stack with the top element removed. 

_Let an invariant of Speci be that no stack object is mutated. Adding a mutating constructor, 


shrink, that mutates the input stack by removing the top element invalidates that invariant. 


Adding a “pure observer," can strengthen a cluster specification or leave it unchanged. 
It cannot weaken the original cluster specification nor make the original and new 
specifications incomparable. Adding an observer can at most add formulae of the form 
P{Pr}Q to the theory of a cluster specification. Since hypotheses of type induction rules deal 
with only constructors, adding an observer has no effect on the type induction rules of the 
cluster specification. Hence, the addition of a (pure) observer cannot weaken or invalidate 


any of the rules. 


As an example of strengthening with an observer, consider adding a size procedure 
specification to a stackfelem] cluster specification that has only constructors. Doing so adds 
theorems about integers to the Th(stack/elem]). As an example of leaving the strength 


unchanged, suppose stackfelem] has null, push, and top, where top mutates its stack 


13. This observation suggests pursuing the definition of a different property of specifications that might be related to 
“expressive-completeness” [Kapur80b]. 
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argument. Adding a read procedure specification that is like top except that it does not 


mutate its stack argument, does not change the strength of the original specification. 
5.3 Essentiality 


In the construction of a specification, we often want it to be "minimal" in a given 
context. That is, we would like to able to pare down a specification to just the "essential part" 
necessary for a desired set of properties to hold. Removing parts that have been shown to be 


inessential gives us a way of paring down a specification. 


A part, P, of a specification, Spec, is inessential for a theory, T, if Spec with P removed 
can still be used to deduce the theorems in T. We say "P is an inessential part of Spec for T.” 
identifying a part of a specification that is inessential to prove a property means that we can 
freely remove or alter that part of the specification and still be ensured that the desired 
property holds. On the other hand, if we were to change some part that is essential then we 


might have to reverify that the property holds. 


Whereas checking for properties defined in Section 5.1 is performed on a single 
specification, checking essentiality and inessentiality is performed on two specifications and a 
theory, where the second Socinication is defined to be a "part" of the second. The 
signatures for checkers for essentiality and inessentiality are as follows: 


essential: specification, specification, theory — boolean 
inessential: specification, specification, theory > boolean 


In Section 5.3.1 we define essentiality and inessentiality by first defining what we mean 
by a part of a specification. In Section 5.3.2 we give some situations for when we might want 


to determine inessential parts of a specification. 
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5.3.1 Definitions 


in the following discussion we treat a specification as a formal system, which is a set of 
symbols, a set ‘of wit's, a set of axioms, and a set of rules. (See Chapter 3 for the 
correspondence between a specification and its formal system.) Thus, it makes sense to talk 
about the language (set of symbols and set of wff’s), axioms, and rules of a specification. For 
a specification, Spec = <L, A, R>, L is its language, A is its set of axioms and R is its set of 
rules. | ji 


‘Def: A part of Spec is a specification with a language, L’CL, a set of axioms, A'CA, and a set 
of rules, R'CR. : 


Examples of parts of a specification are the used trait of a procedure or cluster 
specification, and each of the bound procedure specifications of a cluster specification. 
Notice also that the type induction principle is a part of a cluster specification. Let two parts 


of Spec be P1 = <L1,A1,R1> and P2 = <L2, A2, R2>. 


Equal: P1 = P2 if and only ifL1 = L2, A1 = A2,andR1 = R2. 
Subset: P1CP2 if and only if L1CGL2, A1CA2, and R1CR2. 
Proper subset: P1CP2 if and only if P1CP2 but P1 # P2. 


Ditference: (Spec - P1) is the specification whose language is (L - L1), whose. 
set of axioms is (A - A1), and whose set of rules is (R - R1). 


We require that subsets of sets of axioms and sets of rules are well-formed. For example, if L1 
C L2, all axioms in A2 and all hypotheses and conclusions of rules in R2 are restricted to be in 


L2. Notice that P1 C P2 does not imply Th(P1) C Th(P2). 


Let P be a part of a specification, Spec. Let T be a theory such that each formula in T is 


deducible from Spec. We write this "Spec F T." 
Def: P is an inessential part of Spec for T if and only if (Spec - P) F T. 


Def: An inessential part P of Spec for T is maximal if no part properly containing P is 
inessential. 
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Notice that there can be more than one maximal inessential part of a specification for a given 
theory. 


Def: P is an essential part of Spec for T if and only if (Spec - P) is a maximal inessential part of 
Spec for T. 


Checking for essentiality or inessentiality must be done with respect to a theory since a 
part of a specification that is essential for one theory might be inessential for a different 
theory. Given a theory, T, ifa auaeh P, of a specification, Spec, is purported to be inessential 
for T, then one method for checking the inessentiality of P would be to remove P from Spec 
and check if the remaining specification is strong enough to prove each theorem in T. If each 
theorem in T is provable from (Spec - P), then P is inessential. !f there is some theorem in T 


such that it is not provable from (Spec - P), then some subset of P is essential for T. 
5.3.2 Situations for Determining inessentiality 


Here are three situations in which it would be useful to determine whether a part of a 
specification is inessential. One situation is to check if some part of a specification is 
inessential to prove some property of the specification itself. For example, we might want to 
know what part of a specification is inessential to proving it is fully-covering or deterministic. 
We might want to make a specification weaker, but ensure that it is still fully-covering or 


deterministic. 


A second situation is to check if some part of a specification is inessential to prove 
particular properties of its specificand set. For example, suppose we want to determine if 
some part of our trait for sets is inessential for proving the property, has(delete(s,i),j) = ~(i.eq 
j) A has(s,j). We see, in particular, that the axioms about card are inessential to prove it. 
Another example of this second situation is to determine what part of a trait is inessential to 
establishing one of the hypotheses of a type induction rule associated with a cluster 
specification. For example, in Chapter 3 when we showed the property that the size of all set 


objects is greater than zero (for sets as defined by SetCiusSpec), we used the property from 
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the SetOfint trait that the cardinality of values of set objects is greater than or equal to zero. In 


this case, sort induction is essential, but, for instance, axioms about delete are not. 


A third situation is to determine what part of a specification is inessential in the proof of 
satisfaction between an implementation, Imp, and a specification, Spec. Let T be {imp 
satisties Spec}. Suppose in showing T we use a specification S, whose theory is a subset of 
Th(imp). We might be interested in knowing what an inessential part of S is that is not needed 
to prove T. In knowing what part of S is inessential to the proof of satisfaction, we can change 


that part of S and be guaranteed that Imp still satisfies Spec. 
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6. Conclusions, Contributions, and Further Work 
6.1 Summary of Conclusions and Contributions 


In Chapter 1 we observed that at present formal specifications are difficult to write and 
to apply in the design of software. We believe that the two-tiered approach presented in this 


thesis is one step toward a solution to this problem. 


Our presentation included an approach to writing specifications, a specification 
language, and some ways to evaluate specifications. The approach separates the 
specification of state enaloraaudns and target programming language dependencies from 
the specification of underlying abstractions. The language supports this approach and was 
designed with the programmer in mind. The ways to evaluate specifications, i.e., checking 
and comparing, give a specifier means of convincing himself that his specification reflects his 
understanding of the problem statement. The distinguishing aspects of our solution are (1) 
the separation of concerns in the specification approach, (2) the incorporation of 
programming language dependencies in the specification language, and (3) a theory-oriented 
framework that provides a basis to reason about specifications independently of their 


underlying models. 
The four main contributions of this thesis are: 


1. The rigorous semantics for the two-tiered approach, 
2. The design of a CLU interface language, 


3. A framework for reasoning about two-tiered specifications and 
what they specify, and 


4. Exploiting the framework for evaluating specifications. — 
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The complex part of doing the semantics was in carefully fitting the two tiers together, 
and at the Sine time, keeping the separation clean. Mathematical entities such as algebras 
and relations serve as a basis for defining our model-oriented semantics. Although the 
models chosen are motivated by CLU, they can be used to model the semantics of interface 
languages for other programming languages. The models are relatively independent of 


Larch. 


The key contribution behind the design of the interface specification language for CLU 
was isolating programming language dependencies into one component of a specification. In 
doing so, we shed fight on what aspects of a programming language should show through to 
an interface specification language, and on what aspects were complex to handle (e.g., own 

_variables). Another related contribution is the factorization of the presentation of the 
interface language into a kernel part and an extended part. Although we presented a design 
targeted for a particular programming language, we believe it is general enough to be 


adapted for others. 


We also defined a proof-theoretic framework for reasoning about specifications. This 
reflected the same clean separation between the two tiers as the model-oriented semantics. It 
was designed to allow one to reason about what is being specified completely in terms of the 
text of the specifications. This advantage is especially significant if one has appropriate 


machine support, e.g., a theorem prover. 


In exploring the utility of this framework, we defined some sample properties of 
specifications and ways to compare them. in making these definitions, we illustrated how to 
state their definitions within the proof-theoretic framework. Identifying these properties is of 
concern to a specifier who wants to know if some developing specification is getting "better." 
Experimentation is needed to see if we have focused on the right properties, but we have 
provided here at least some of the properties that might be of use to a specifier, and an 


indication of how to define them. 
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6.2 Directions for Further Work 


We first discuss two areas of "basic" research: devetoping other interface languages 
and evaluating collections of specifications. We then discuss two areas of "experimental" 


research: building machine support and applying the two-tiered approach to examples. 
6.2.1. Development of Interface Languages 


One test of our two-tiered approach is to develop interface languages for other 
programming languages, both sequential and concurrent. We have not discussed 
concurrency at all in this thesis, and would be interested to see how easily the kernel interface 
language can be extended to handle concurrent programming issues. A first step to take is to 
extend our model to concurrent programming and then add syntactic extensions to the kernel 
language. Stark [Stark83] defines a model of the behavior of concurrent systems, which 
could serve as a reasonable basis for such a specification language. Jones extends his own 


work for sequential programs to concurrent ones [Jones81]. 


Development of interface languages for other sequential programming languages is 
currently being done for Cedar Mesa [Horning83]. Its design borrows directly from the kernel 


language we defined in Chapter 2. 


Finally, we mention with hindsight a change we might make to the CLU interface 
language. Instead of giving two assertions in a procedure specification, since they are both 
interpreted with respect to two states, we could give only one assertion [Horning83, Yelick83]. 
Hence, instead of writing a pair, <pre, post>, in the body of a procedure specification, we write 
a single assertion. We also fiention an obvious extension to the language. Instead of listing a 
single used trait in a uses clause of a procedure or cluster specification, we can list a set of 
used traits. Furthermore, we can perform operations on each of the traits in the list, e.g., 
renaming and inclusion. This aitendion does not change the semantics of a procedure or 


cluster specification because a single trait can be defined to include (i.e., includes in Larch) 
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each trait in a set of traits. 
6.2.2 Evaluating Collections of Specification 


In Chapter 5, we concentrated on individual specifications, and not at all on collections 
of specifications. As a collection of specifications grows, the issue of evaluating it becomes 
just as important as, and probably harder than, evaluating each of its individual components. 
We briefly mention some relations among specifications that are easily derived from the 


formalism we have described for the interface language. 


A specifier usually has in mind some structure among the mass of spacheatices written. 
Depicting this structure is good practice in the design of a large apechicaticn as well as good 
documentation for the reader. For example, we define uses to be a relation on a collection of 
specifications, where a specification, Spec, uses a trait, Tr, if Tr is Spec’s used trait. Similarly, 
we define imports to be a relation on a collection of specifications, where a specification, 


Spec, imports a cluster specification, Clus, if Spec imports the type defined by Clus. 


These relations indicate global, or interconnection complexity, as opposed to the local 
complexity that can be seen in individual specification units. Evaluating the complexity of 
each of these kinds of relations can give the reader and writer of specifications an idea of the 
complexity of the specification. We might treat the relation associated with each of these 
kinds of specifications as a graph and then analyze the complexity of the specification in 
terms of properties of the graph. Some properties to check of a graph ate whether it is 
acyclic, whether it is hierarchical (no sharing), or whether it is a tree (one root, no sharing). 
Whether a property is desirable or not would depend on the use of the specification. For 
example, one can argue that in writing a good specification one should have a uses relation 
that has a lot of sharing of the used traits to avoid repetition and to reuse work already done. 
On the other hand, care must be taken when changes are made to a shared trait; a 


specification with a hierarchical uses relation might be easier to modify. 
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6.2.3 Machine Support 


The limited experience we have had in writing specifications makes evident the need for 
machine-support. Without machine-support, we have no hope of expecting either specifiers 


to write or programmers to use specifications, except as an academic exercise. 


Minimally, machine-support should provide ways to manage the text of specifications; 
ideally, it should provide ways to reason about their meaning as well. Our list of tools includes 


(see [Guttag82)): 


1. A syntax checker. 


2. A library. Both traits and interface specifications, and both 
problem independent and dependent specifications should be 
included. Traits should be included for possible reuse; interfaces, 
primarily to provide examples. 


3. An editor. A syntax-directed, interactive editor should supply 
templates, generate redundant information, and keep track of 
missing information. 


4. A semantic checker. Theorem proving technology can be applied 
to the manipulation of specifications for checking properties of both 
specifications and what they specify. Much work remains in finding 
algorithms and heuristics that check for these properties. 


The Larch project at M.I.T. has started on the development of these tools as part of a 
specification environment. Included in this development effort are implementations of a 
syntax and static semantics checker [Kownacki83] and a semantic cco that can 
manipulate equations in traits [Lescanne83, Forgaard83], and designs of a library [Atreya82] 


and a syntax-directed editor [Zachary83]. 
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6.2.4 Experimentation 


The two-tiered approach needs to be tested on realistic examples of substantial size. 
We can test the utility of the formal framework we set up only by trying it out. In doing so, we 
can then evaluate whether the two level partitioning is good, whether it makes it easier to read 
and write specifications, and whether it leads to better specifications. We can also see 


whether the separation of concerns leads to a better understanding of the specificands. 


We may discover that we need to make changes to the design of the interface language. 
Identifying the language constructs that are used frequently, those that are rarely used, and 
those that would be nice to have in order to enhance expressibility can help in the designs of 


future interface languages. 


We also need to discover other ways to evaluate a specification, other properties and 
qualities, and ways to analyze tradeoffs among them. We should test whether the properties 
we have discussed or variations of them are of any use or interest to a specifier. We should 
see under what circumstances a specifier tends to perform evaluation and classify what kinds 


of changes to a specification are made as a result of evaluation. 


Finally, with more experimentation, we hope to show the utility of using formal 
specifications; in particular, to demonstrate that forcing precision in the design process has a 


beneficial effect on the overall programming process. 
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Appendix | - Interface and Trait Specifications 


_ Equivalence: trait 

introduces 
eq: E, E — Bool 

constrains [eq] so that for all [x, y, z: E] 
eq(x,x) = true 
eq(x,y) = eq(y,x) : 
((eq(x,y) A eq(y,z)) = eq(x,z)) = true 


Figure 3. Equivalence Trait 


SetOfE: trait 

includes Integer, Equivalence 

introduces 
empty: > C 
add: C,E—>C 
remove: C,E - C 
has: C, E-> Bool 
isEmpty: C — Bool 
card: C — Int 

closes C over [empty, add] 

constrains [C] so that for all [s: C, e, e1: E] 
remove(empty, e) = empty 
remove(add(s,e), e1) = if eq(e,e1) then remove(s,e1) else add(remove(s,e1),e) 
has(empty, e) = false 
has(add(s,e), e1) = if eq(e,e1) then true else has(s,e1) 
isEmpty(empty) = true 
isEmpty(add(s,e)) = false 
card(empty) = 0 
card(add(s,e)) = if has(s,e) then card(s) else 1 + card(s) 


SetOfint: trait 
includes SetOf€ with [SI for C, Int for E] 


Figure 4. SetOfE and SetOfint Traits 
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set = cluster is singleton, union, delete, size 
uses SetOfint 
provides mutable set from Sl 


singleton = proc (i: int) returns (s: set) 
uses SetOfint 
pre true 
post sJ = add(empty, it) A news A mutates 2 A returns 
end 


union = proc (s1, s2: set) returns (s3: set) 
uses SetOfint , 
pre true 
post Vi-Int [has(s3¥,i) = has(s1t,i) V has(s2t,i)] 
A new s3 A mutates 2 A returns 
end 


delete = proc (s: set, i: int) signals (emptiesSet) 
uses SetOfint 
pre true 
post [((card(st) > 2) V ~has(st,it)) = 
(st = remove(st,it) A mutates s A returns)] A 
[((card(st) .eq 1) A has(st,it)) = 
mutates © A signals emptiesSet] A 
new @ 
end 


size = proc (s: set) returns (i: int) 
‘uses SetOfint 
pre true 
postil = card(st) Anew @ A mutates 2 A returns 
end 
end 


Figure 9. Set Cluster Specification (SetClusSpec) 
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stack = cluster is empty, grow, read 
uses StackOfiInt 
provides mutable stack from Stkl 


empty = proc (} returns (st: stack) 
pre true 
postst) = null Anewst 
end 


grow = proc (st: stack, i: int) 
pre true 
mutates st 
postst) = push(stt, it) 
end 


read = proc (st: stack) returns (i: int) 
pre ~isNuil(stt) 
postil = top(stt) 
end 


end stack 


Figure 12. Stack Cluster Specification 
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StackOfint: trait 
includes StackOfE with [Stkl for C, Integer for E] 


StackOfeE: trait 

includes Integer 

introduces 
null: > C 
push: C, E> C 
top: C > E 
pop:C > C 
isNull: C — Bool 
isin: C, E — Bool 
size: C -> Int 

closes C over [null, push] 

constrains [C] so that for all [s: C, e,e1: E] 
top(null) exempt 
top(push(s,e)) = e 
pop(nuil) exempt 
pop(push(s,e)) = s 
isNull(null) = true 
isNull(push(s,e)) = false 
isIn(null,e) = false 
isin(push(s,e),e1) = if e .eq e1 then true else isin(s,e1) 
size(null) = 0 
size(push(s,e)) = size(s) + 1 


Figure 13. Traits for Stacks 
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Appendix Il - Proofs 


ll.1. Validity of a Type Induction Rule 
For the predicate, 
P(t) = ~isNull(tt) = card(top(tt)t) < 64. 
we show the validity of S hypotheses of the following type induction rule. 
Hypotheses: 


HB true {empty} ~isNull(st)) => [card(top(sts)4) < 64] 

HP ~isNull(s1t) = [card(top(s1t)t) < 64] {grow} 
~isNull(s24) = [card(top(s24)+) < 64] 

HM S = top(v1t) A. ~isNull(v1t) = [card(top(vit)t) < 64] {delete} 
~isNull(v1 +) = [card(top(v1+)+) < 64] 


Conclusion: true {S} Vt:stack[set] ~isNull(t+) = card(top(t+)+) < 64 for all 
Proof: 


1. HB: true {empty} ~isNull(st+) =» [card(top(st}) +4) < 64] 
_ Th(empty) gives the axiom true {empty} empty.post(st). 
where empty.post(st) = st} = null A new st A mutates @ A returns 


empty.post(st) = P[{st/t] is valid because 
st) = null = [~isEmpty(st+) = card(top(st+)+) < 64], 
which is true since ~isEmpty(st+) is false. 


HB is valid by the rule of consequence. 


2. HP: ~isNull(s1t) =» [card(top(s1t)t) < 64] {grow} 
~isNull(s24) = [card(top(s24)+) < 64] 


Assume ~isNull(s1t) = card(top(s11)t) < 64 
We have the axiom, card(st) < 64 {grow} grow.post(s1, s2, s) 
where grow.post(s1, $2, s) = 
s2+ = push(sit,s) A new s2 A mutates @ A returns 


We have that card(st) < 64 
=> card(s+) < 64, from mutates @ 
= card(top(push(s1t,s))+) < 64, from Th(StackOfSS) 
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= card(top(s24)+) < 64, from substitution for s24 from grow.post(s1, s2, s) 
= [~isNull(s2+) = [card(top(s2+)¥+) < 64]] (a weaker assertion) 


HP is valid by the rule of consequence. 


3.HM:s = top(vit) A ~isNull(v1t) = [card(top(vit)t) < 64] {delete} 
~isNull(v14) = [card(top(vi 4)+) < 64] 
Assume ~isNull(v1t) = [card(top(v1t)t) < 64]. The post-condition of delete is: 
[((card(st) > 2) V ~has(st,it)) = 
(st = remove(st,it) A mutates s A returns)] A 
[((card(st) .eq 1) A has(st,it)) = 
mutates @ A signals emptiesSet] A 
new @ 


Assume ~isNull(v1t). With the term top(v1t) substituted in for s, we have: 


(a) ((card(top(v1t)t) > 2) V ~has(top(v1t)t,it)) => 
[top(v1t)) = remove(top(v1t)t,it) A mutates top(vit) A returns] 


Since card(top(v1t)t) < 64 (from the assumptions), 
card(remove(top(v1t)t,it)) < 64 by Th(SetOfint) 
card(top(vit) +) < 64 by substitution, 
card(top(v1+)+) < 64 since the object v1 is not mutated. 

(Only top(v1t) is possibly mutated.) 


(b) ((card(top(v1t)t) .eq 1) A has(top(v1t)t,it)) = 
A card(top(vit)t) .eq 1 A mutates @ A signals emptiesSet 


Since card(top(v1t)t) < 64 (again, from the assumptions), 
card(top(v1+)¥) < 64, from mutates 2. 


HM is valid by the rule of consequence. fe 3 | 


11.2. Proof of Satisfaction 


We now give an example of a cluster that satisfies a cluster specification. Figure 18 
gives a set cluster specification. Figure 19 gives an implementation of this cluster 
specification. The implementation uses the rep type, array[int], for which a cluster 
specification is given in Figure 20. The ArrayOfint trait used to define the array{int] type is 


given in Figure 21. 
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set = cluster is create, insert, size, member 
uses SetOfint 
provides mutable set from Sl 


create = proc () returns (s: set) 
pre true 
posts} = empty A news A mutates @ A returns 
end 


insert = proc (s: set, i: int) 
pre true e ; 
post st = add(st,i) A new @ A mutates s A returns 
end 


size = proc (s: set) returns (i: int) 
pre true 
posti+ = card(st) A new @ A mutates @ A returns 
end 


member = proc (s: set, i: int) returns (b: bool) 
pre true 
post has(st, i) = b} A new @ A mutates @ A returns 
end member 


end 


Figure 18. Set Cluster Specification 


We sketch the proof of satisfaction below. We prefix procedure names by "T$" to 
distinguish them from trait function names. We expect machine tools to aid the implementor 
in performing much of the symbol manipulation found in these kinds of proofs [Boyer79, 
Good75, Good78, Musser77, Musser80]. 

1. Let the abstraction function be: 
A: TtoS(array[int]) — TtoS(set) 


A(a) = if size(a) = 0 then empty 
else if size(a) > 0 add(A(remh(a)), top(a)) 


2. The rep invariant, Ri{qa), is: 


Va:Al [low(a) = 1 A size(a)>0 A NoDups(a)], 
where NoDups(a) = Vi,j [fetch(a,i) = fetch(a,j) = i = jj. 
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set = cluster is empty, insert, size, member 


rep = array[int] 


create = proc () returns (cvt) 
return (rep$create(1)) 
end create 


insert = proc (c: cvt, i: int) 
if ~member(up(c), i) then rep$addh(c,i) end 
end insert 


size = proc (c: cvt) returns (int) 
return(rep$size(c)) 
end size 


member = proc (c: cvt, i: int) returns (bool) 

k: int:= rep$low(c) 
while k < rep$high(c) do 

if i = rep$fetch(c,k) then 

return(true) end 
kis k+1 

end: 
return(false) 
end member 


end set 


Figure 19. Implementation of the Set Cluster Specification 


3. For each procedure in the set cluster we must show it satisfies its corresponding procedure 


specification in the set cluster specification under A. For our simple example, in most cases 


this reduces to showing that the post-condition of some procedure specification of the 


array[int] cluster specification implies the post-condition of the corresponding procedure 


specification of the set cluster specification. We also need to show that the rep invariant 


holds for each procedure of the set cluster implementation. 


3.1. set$create: Let cl} = create(1) from array[int]'’s create.post. Show thats} = empty. 


st} = A(ct) 
= A(create(1)) by substitution 


= empty by the definition of A, since size(create(1)) =. 0. 
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array[int] = cluster is create, addh, size, low, high, fetch 
uses ArrayOfint 
provides mutable array[int] from Al 


create = proc (i: int) returns (a: array[int]) 
pre true 
post a} = create(1) A new aA mutates @ A returns 
end 


addh = proc (a: array[int], i: int) 
pre true 
post a} = addh(at,i) A new @ A mutates a A returns 
end 


size = proc (a: array[int]) returns (i: int) 
pre true 
posti) = size(st) A new @ A mutates @ A returns 
end 


low = proc (a: array[int]) returns (i: int) 
pre true 
posti} = low(st) A new @ A mutates @ A returns 
end 


high = proc (a: array[int]) returns (i: int) 
pre true . : 
postis = high(st) A new @ A mutates @ A returns 
end 


fetch = proc (a: array[int], i: int) returns (j: int) signals (bounds) 
pre true 
post [low(at)<i<high(at) = (j+ = fetch(at,i) A mutates @ A returns] A 
[(i<low(at) V Dhigh(at)) =» (signals bounds A mutates 2)] 
A new © 


end array[int] 


Figure 20. Array Cluster Specification 


We know that s is new since rep$create returns a new object, i.e., new c = new s. Since 
rep$create does not mutate any object, the mutates @ assertion is true. Thus, the 
post-condition of create is satisfied. We show that the rep invariant, Rif(c4), is established: 
low(ct) = low(create(1)) = 1, from Th(ArrayOfint). 
size(c+) = size(create(1)) = 0 from Th(ArrayOfint). 
NoDups(c+) = NoDups(create(1)) = Vi,j:Int [fetch(ct,i) = fetch(c4,j) = i = j], 
In Th(ArrayOfint), fetch(create(x),y) is defined, but exempt. 
Let v = fetch(create(1),i) and w = fetch(create(1),j). 
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ArrayOfint trait 

includes Array [Al for A, int_obj for E] 

introduces 
empty: Al —> Bool 
size: Al > Int 
isin: Al, int_obj — Bool 

constrains [Al] so that for V [k: Int, i,j: int_obj, a: Al] 
empty(create(k)) = true 
empty(addh(a,i)) = false 
size(create(k)) = O 
size(addh(a,i)) = size(a) + 1 
isin(create(k)) = false 
isin(addh(a,i),j) = if i.eq j then true else isin(a,j) 


Array: trait 

includes Integer, Elem 

introduces 
create: Int > A 
addh: A,E >A 
remh:A—>A 
low: A — Int 
high: A - Int 
fetch: A, Int-> E 
store: A, Int, E— A 
size: A -> Bool 

closes A over [create, addh] 

constrains [A] so that for all [i,i1,i2: Int, e,e1,e2: E, a: A] 
remh(create(i)) exempt 
remh(addh(a,e)) = a 
low(create(i)) = i 
low(addh(a,e)) = low(a) 
high(a) = low(a) + size(a) - 1 
fetch(create(i1),i2) exempt 
fetch(addh(a,e),i) = if i .eq (low(a) + size(a)) then e else fetch(a,i) 
store(create(i1),i2,e) exempt 
store(addh(a,e1),i,e2) = if i .eq (low(a) + size(a)) then addh(a,e2) 

else addh(store(a,i,e2),e1) 

size(create(i)) = 0 
size(addh(a,e)) = size(a) + 1 


Figure 21. ArrayOfint and Array Traits 


Then v = w= i = j, and so NoDups(c4) holds. 


3.2. set$insert: Let st = A(ct). Show thats) = add(st, i). 
Case 1: ~member(st, i) 
Letc+ = addh(ct,i) from addh.post. 
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st = A(ct) ; 
= add(A(remh(c4),top(c+))) 
= add(A(remh(addh(ct,i))), top(addh(ct,i))) 
= add(A(ct), i) 
= add(st, i) 
Case 2: member(st, i) 
=> has(st, i) 
=> add(st, i) = st from Th(SetOtint) 
A(ct) 
A(ct) since ct = ct 
st 
= add(st, i) - 


sv 


Since sethmember (see 3.4 below) and rep$addh do not create new objects, the new © 
assertion of insert’s post-condition is true. The mutates assertion is true since the value of 
the input set object, s, might be changed. Thus, the post-condition of insert is satisfied. We 
show that the rep invariant is maintained: 
low(c+) = low(addh(ct,i)) = tow(ct) = 
size(c+) = size(addh(ct,i)) = 1 + size(ct), which is true since size(ct) > 0. 
NoDups(c+) = NoDups(addh(ct,i)) 
Vj, k:Int [fetch(addh(ct,i),j) = fetch(addh(ct,i),k)] 
= Vj,k:Int [(if j = low(ct) + size(ct) then i else fetch(ct,j)) = 
(if k = low(ct) + size(ct) then i else fetch(ct,k))] 
=> j = k since NoDups(ct). 


3.3. set$size: Let st = A(ct). Show that size(ct) = card(st). We prove this by induction. 
Case 1: ct = oe 
size(ct) = 
Cadiehees 
card(A(ct)) 
card(st) 
Case 2: ct = addh(x,y). The induction hypothesis (IH) is size(x) = aay 
From NoDups, we know that ~isin(x,y). 
From Lemma (below) ~isin(x,y) = ~has(A(x),y) 
Show size(ct) = card(st). 
size(ct) = 1 + size(x) 
1 + card(A(x)), by IH 
card(add(A(x),y)) since ~has(A(x), y) 
card(add(A(remh(addh(x,y))),top(addh(x,y)))) 
card({A(addh(x,y))) 
card(A(ct)) 
card(st) 


Hou tt 
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Since rep$size neither creates new objects nor mutates existing ones, the new @ and 
mutates 2 assertions of size’s post-condition are both true. Thus, the post-condition of size 
is satisfied. We show that the rep invariant is maintained. Since rep$size mutates nothing, c+ 
= Ct, 

low(c4) = low(ct) = 

size(c+) = size(ct) > 0, 
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NoDups(c+) = NoDups(ct). 


3.4. set$member: Let st = A(ct) and let b be the boolean returned by member. Show that 
has(st,i) = bs. 
Case 1: empty(ct) = (isin(ct,i) = false) = 
(has(A(ct),i) = false), by Lemma below. 
Case 2: The loop invariant is inbounds(k) and Vd:int low(ct) <d<k [fetch(ct,d) # i) 
where inbounds(k) = low(ct)<kt<high(ct) 
Case 2.1: i = j 
At the return(true) statement we know 
that b} = true A isin(ct,i) = bt. 
isin(ct,i) =» has(A(ct),i) = has(st,i), by Lemma below. 
Case 2.2: i # j 
We increment k and go to the start of the loop. 
At termination ofloop, kt = high(ct)+1A 
Vd:Int low(ct)<d<high(ct) + 1 [fetch(ct,d) + i] 
= Vd:Int low(ct)<d<high(ct) [fetch(ct,d) # i] 
=> (isin(ct,i) = false) 
=» (has(A(ct),i) = false), by Lemma below. 


' Since rep$low, rep$high, rep$fetch, and int$add do not create new objects nor mutate - 
existing ones, the new @ and mutates @ assertions of member's post-condition are both 
true. Thus, the post-condition of member is satisfied. The rep invariant is maintained 
because rep$low, rep$high, rep$fetch do not mutate any objects, and so c+ = ct, as in the 
case for set$size. 


Lemma: Vx:Al [isin(x,i) = has(A(x),i)] 
Pf: By sort induction. 
Case 1: Let x = create(k) 
isin(x,i) = false 
has(A(create(k)),i) = has(empty,i) = false 
Case 2: Let x = addh(y,k) 
isin(x,i) 
= isin(addh(y,k),i) 
= ifi=k then true else isin(y,i) 


has(A(addh(y,k),i) 

= has(add(y,k),i) 

= ifi=k then true else has(y,i) . 

True, by induction. (Proof of lemma) 
(Proof of set)l 
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